How to determine Pearsons correlation in Python?

This recipe helps you determine Pearsons correlation in Python
Last Updated: 23 Jun 2022

Get access to Data Science projects View all Data Science projects

DATA MUNGING DATA CLEANING PYTHON MACHINE LEARNING RECIPES PANDAS CHEATSHEET ALL TAGS

Recipe Objective

Pearson"s correlation is very important statical data that we need many times. We can calculate it manually but it takes time.

So this is the recipe on how we can determine Pearson"s correlation in Python

Recipe Objective

Step 1 - Importing Library

import matplotlib.pyplot as plt import statistics as stats import pandas as pd import random import seaborn as sns

We have imported stats, seaborn and pandas which is needed.

Step 2 - Creating a dataframe

We have created a empty dataframe and then added rows to it with random numbers. df = pd.DataFrame() df["x"] = random.sample(range(1, 100), 75) df["y"] = random.sample(range(1, 100), 75) print(); print(df.head())

Step 3 - Calculating Pearsons correlation coefficient

We hawe defined a function with differnt steps that we will see.

We have calculated mean and standard deviation of x and length of x

def pearson(x,y): n = len(x) standard_score_x = []; standard_score_y = []; mean_x = stats.mean(x) standard_deviation_x = stats.stdev(x)

We atre calculating mean and standard deviation of y

mean_y = stats.mean(y) standard_deviation_y = stats.stdev(y)

We are calculating standard score by dividing difference of observation and mean with standard deviation. We have done this for both X and Y

for observation in x: standard_score_x.append((observation - mean_x)/standard_deviation_x) for observation in y: standard_score_y.append((observation - mean_y)/standard_deviation_y) return (sum([i*j for i,j in zip(standard_score_x, standard_score_y)]))/(n-1)

Printing the Results

result = pearson(df.x, df.y) print() print("Pearson"s correlation coefficient is: ", result) sns.lmplot("x", "y", data=df, fit_reg=True) plt.show()

    x   y
0  96  62
1   1  81
2  27  73
3  55  26
4  83  93

Pearson"s correlation coefficient is:  -0.006387074440361877

Download Materials

iPython Notebook

What Users are saying..

Ray han

Tech Leader | Stanford / Yale University

I think that they are fantastic. I attended Yale and Stanford and have worked at Honeywell,Oracle, and Arthur Andersen(Accenture) in the US. I have taken Big Data and Hadoop,NoSQL, Spark, Hadoop... Read More