Pearson"s correlation is very important statical data that we need many times. We can calculate it manually but it takes time.
So this is the recipe on how we can determine Pearson"s correlation in Python
import matplotlib.pyplot as plt import statistics as stats import pandas as pd import random import seaborn as sns
We have imported stats, seaborn and pandas which is needed.
We have created a empty dataframe and then added rows to it with random numbers.
df = pd.DataFrame()
df["x"] = random.sample(range(1, 100), 75)
df["y"] = random.sample(range(1, 100), 75)
We hawe defined a function with differnt steps that we will see.
def pearson(x,y): n = len(x) standard_score_x = ; standard_score_y = ; mean_x = stats.mean(x) standard_deviation_x = stats.stdev(x)
mean_y = stats.mean(y) standard_deviation_y = stats.stdev(y)
for observation in x: standard_score_x.append((observation - mean_x)/standard_deviation_x) for observation in y: standard_score_y.append((observation - mean_y)/standard_deviation_y) return (sum([i*j for i,j in zip(standard_score_x, standard_score_y)]))/(n-1)
result = pearson(df.x, df.y)
print("Pearson"s correlation coefficient is: ", result)
sns.lmplot("x", "y", data=df, fit_reg=True)
x y 0 96 62 1 1 81 2 27 73 3 55 26 4 83 93 Pearson"s correlation coefficient is: -0.006387074440361877