How to determine Pearsons correlation in Python?
DATA MUNGING DATA CLEANING PYTHON MACHINE LEARNING RECIPES PANDAS CHEATSHEET     ALL TAGS

# How to determine Pearsons correlation in Python?

This recipe helps you determine Pearsons correlation in Python

0

## Recipe Objective

Pearson"s correlation is very important statical data that we need many times. We can calculate it manually but it takes time.

So this is the recipe on how we can determine Pearson"s correlation in Python

## Step 1 - Importing Library

``` import matplotlib.pyplot as plt import statistics as stats import pandas as pd import random import seaborn as sns ```

We have imported stats, seaborn and pandas which is needed.

## Step 2 - Creating a dataframe

We have created a empty dataframe and then added rows to it with random numbers. ``` df = pd.DataFrame() df["x"] = random.sample(range(1, 100), 75) df["y"] = random.sample(range(1, 100), 75) print(); print(df.head()) ```

## Step 3 - Calculating Pearsons correlation coefficient

We hawe defined a function with differnt steps that we will see.

• We have calculated mean and standard deviation of x and length of x
• ``` def pearson(x,y): n = len(x) standard_score_x = []; standard_score_y = []; mean_x = stats.mean(x) standard_deviation_x = stats.stdev(x) ```
• We atre calculating mean and standard deviation of y
• ``` mean_y = stats.mean(y) standard_deviation_y = stats.stdev(y) ```
• We are calculating standard score by dividing difference of observation and mean with standard deviation. We have done this for both X and Y
• ``` for observation in x: standard_score_x.append((observation - mean_x)/standard_deviation_x) for observation in y: standard_score_y.append((observation - mean_y)/standard_deviation_y) return (sum([i*j for i,j in zip(standard_score_x, standard_score_y)]))/(n-1) ```

## Printing the Results

``` result = pearson(df.x, df.y) print() print("Pearson"s correlation coefficient is: ", result) sns.lmplot("x", "y", data=df, fit_reg=True) plt.show() ```

```    x   y
0  96  62
1   1  81
2  27  73
3  55  26
4  83  93

Pearson"s correlation coefficient is:  -0.006387074440361877
```

#### Relevant Projects

##### Solving Multiple Classification use cases Using H2O
In this project, we are going to talk about H2O and functionality in terms of building Machine Learning models.

##### Identifying Product Bundles from Sales Data Using R Language
In this data science project in R, we are going to talk about subjective segmentation which is a clustering technique to find out product bundles in sales data.

##### Predict Credit Default | Give Me Some Credit Kaggle
In this data science project, you will predict borrowers chance of defaulting on credit loans by building a credit score prediction model.

##### Zillow’s Home Value Prediction (Zestimate)
Data Science Project in R -Build a machine learning algorithm to predict the future sale prices of homes.

##### Time Series Forecasting with LSTM Neural Network Python
Deep Learning Project- Learn to apply deep learning paradigm to forecast univariate time series data.

##### PySpark Tutorial - Learn to use Apache Spark with Python
PySpark Project-Get a handle on using Python with Spark through this hands-on data processing spark python tutorial.

##### Build an Image Classifier for Plant Species Identification
In this machine learning project, we will use binary leaf images and extracted features, including shape, margin, and texture to accurately identify plant species using different benchmark classification techniques.

##### Natural language processing Chatbot application using NLTK for text classification
In this NLP AI application, we build the core conversational engine for a chatbot. We use the popular NLTK text classification library to achieve this.

##### Customer Churn Prediction Analysis using Ensemble Techniques
In this machine learning churn project, we implement a churn prediction model in python using ensemble techniques.

##### Machine Learning or Predictive Models in IoT - Energy Prediction Use Case
In this machine learning and IoT project, we are going to test out the experimental data using various predictive models and train the models and break the energy usage.