How to determine Pearsons correlation in Python?
DATA MUNGING

How to determine Pearsons correlation in Python?

How to determine Pearsons correlation in Python?

This recipe helps you determine Pearsons correlation in Python

0
In [2]:
def Snippet_120():
    print()
    print(format('How to determine Pearson\'s correlation in Python','*^82'))

    import warnings
    warnings.filterwarnings("ignore")

    # load libraries
    import matplotlib.pyplot as plt
    import statistics as stats
    import pandas as pd
    import random
    import seaborn as sns

    # Create empty dataframe
    df = pd.DataFrame()
    # Add columns
    df['x'] = random.sample(range(1, 100), 75)
    df['y'] = random.sample(range(1, 100), 75)

    # View first few rows of data
    print(); print(df.head())

    # Calculate Pearson’s Correlation Coefficient
    def pearson(x,y):
        # Create n, the number of observations in the data
        n = len(x)
        # Create lists to store the standard scores
        standard_score_x = []; standard_score_y = [];
        # Calculate the mean of x
        mean_x = stats.mean(x)
        # Calculate the standard deviation of x
        standard_deviation_x = stats.stdev(x)
        # Calculate the mean of y
        mean_y = stats.mean(y)
        # Calculate the standard deviation of y
        standard_deviation_y = stats.stdev(y)
        # For each observation in x
        for observation in x:
            # Calculate the standard score of x
            standard_score_x.append((observation - mean_x)/standard_deviation_x)
        # For each observation in y
        for observation in y:
            # Calculate the standard score of y
            standard_score_y.append((observation - mean_y)/standard_deviation_y)
        # Multiple the standard scores together, sum them, then divide by n-1, return that value
        return (sum([i*j for i,j in zip(standard_score_x, standard_score_y)]))/(n-1)

    # Show Pearson's Correlation Coefficient
    result = pearson(df.x, df.y)
    print()
    print("Pearson\'s correlation coefficient is: ", result)
    sns.lmplot('x', 'y', data=df, fit_reg=True)
    plt.show()

Snippet_120()
*****************How to determine Pearson's correlation in Python*****************

    x   y
0  69  99
1  56  30
2  64  62
3  58   8
4  14  64

Pearson's correlation coefficient is:  0.3810462941506265
In [ ]:

Relevant Projects

Credit Card Fraud Detection as a Classification Problem
In this data science project, we will predict the credit card fraud in the transactional dataset using some of the predictive models.

Ecommerce product reviews - Pairwise ranking and sentiment analysis
This project analyzes a dataset containing ecommerce product reviews. The goal is to use machine learning models to perform sentiment analysis on product reviews and rank them based on relevance. Reviews play a key role in product recommendation systems.

Predict Credit Default | Give Me Some Credit Kaggle
In this data science project, you will predict borrowers chance of defaulting on credit loans by building a credit score prediction model.

Forecast Inventory demand using historical sales data in R
In this machine learning project, you will develop a machine learning model to accurately forecast inventory demand based on historical sales data.

Choosing the right Time Series Forecasting Methods
There are different time series forecasting methods to forecast stock price, demand etc. In this machine learning project, you will learn to determine which forecasting method to be used when and how to apply with time series forecasting example.

Deep Learning with Keras in R to Predict Customer Churn
In this deep learning project, we will predict customer churn using Artificial Neural Networks and learn how to model an ANN in R with the keras deep learning package.

PySpark Tutorial - Learn to use Apache Spark with Python
PySpark Project-Get a handle on using Python with Spark through this hands-on data processing spark python tutorial.

Human Activity Recognition Using Smartphones Data Set
In this deep learning project, you will build a classification system where to precisely identify human fitness activities.

Data Science Project-All State Insurance Claims Severity Prediction
Data science project in R to develop automated methods for predicting the cost and severity of insurance claims.

Data Science Project - Instacart Market Basket Analysis
Data Science Project - Build a recommendation engine which will predict the products to be purchased by an Instacart consumer again.