MACHINE LEARNING RECIPES
DATA CLEANING PYTHON
DATA MUNGING
PANDAS CHEATSHEET
ALL TAGS
# How to build a correlation matrix in R?

# How to build a correlation matrix in R?

This recipe helps you build a correlation matrix in R

A correlation matrix is a "square" table which consists of correlation coefficients for a set of variables. They are mainly used to determine relationships between the variables.

There are three main applications of correlation matrix:

- To explore patterns in a large dataset by summarising it in a form of a table.
- Used as an input for exploratory data analysis, structural equation models and confirmatory factor analysis.
- Used as a diagnostic step for checking different analysis. For example, a high correlation coefficients indicates that linear regression is unreliable.

This recipe demonstrates how to build a correlation matrix.

Dataset description: It is the basic data about the customers going to the supermarket mall. The variable that we interested in is Annual.Income which is in 1000s , Spending Score and Age

```
# Data manipulation package
library(dplyr)
library(tidyverse)
# reading a dataset
customer_seg = read.csv('Mall_Customers.csv')
# selecting the required variables using the select() function
customer_seg_var = select(customer_seg, Age, Annual.Income..k..,Spending.Score..1.100.)
# summary of the selected variables
glimpse(customer_seg_var)
```

Observations: 200 Variables: 3 $ Age19, 21, 20, 23, 31, 22, 35, 23, 64, 30, 67, 35… $ Annual.Income..k.. 15, 15, 16, 16, 17, 17, 18, 18, 19, 19, 19, 19… $ Spending.Score..1.100. 39, 81, 6, 77, 40, 76, 6, 94, 3, 72, 14, 99, 1…

We use cor() function to create a correlation matrix.

Syntax: corr(x, method = )

where:

- x = dataframe as input
- method = An arguement which provides us to input a method of calculation in the form of vector. The default is Pearson's

```
customer_seg_var.cor = cor(customer_seg_var)
customer_seg_var.cor
```

Age Annual.Income..k.. Spending.Score..1.100. Age 1.00000000 -0.012398043 -0.327226846 Annual.Income..k.. -0.01239804 1.000000000 0.009902848 Spending.Score..1.100. -0.32722685 0.009902848 1.000000000

Note: The Diagonal elements in the matrix is 1.0 as this is the correlation coefficient of the same variable and the coefficients ranges of -1 to 1

In this NLP AI application, we build the core conversational engine for a chatbot. We use the popular NLTK text classification library to achieve this.

In this data science project, you will predict borrowers chance of defaulting on credit loans by building a credit score prediction model.

In this project, we are going to talk about Time Series Forecasting to predict the electricity requirement for a particular house using Prophet.

In this data science project, we will predict the credit card fraud in the transactional dataset using some of the predictive models.

This project analyzes a dataset containing ecommerce product reviews. The goal is to use machine learning models to perform sentiment analysis on product reviews and rank them based on relevance. Reviews play a key role in product recommendation systems.

Use the Amazon Reviews/Ratings dataset of 2 Million records to build a recommender system using memory-based collaborative filtering in Python.

In this machine learning pricing project, we implement a retail price optimization algorithm using regression trees. This is one of the first steps to building a dynamic pricing model.

Data Science Project in R -Build a machine learning algorithm to predict the future sale prices of homes.

Deep Learning Project- Learn to apply deep learning paradigm to forecast univariate time series data.

Machine Learning Project in R- Predict the customer churn of telecom sector and find out the key drivers that lead to churn. Learn how the logistic regression model using R can be used to identify the customer churn in telecom dataset.