How to build a correlation matrix in R?

How to build a correlation matrix in R?

How to build a correlation matrix in R?

This recipe helps you build a correlation matrix in R


Recipe Objective

A correlation matrix is a "square" table which consists of correlation coefficients for a set of variables. They are mainly used to determine relationships between the variables.

There are three main applications of correlation matrix:

  1. To explore patterns in a large dataset by summarising it in a form of a table.
  2. Used as an input for exploratory data analysis, structural equation models and confirmatory factor analysis.
  3. Used as a diagnostic step for checking different analysis. For example, a high correlation coefficients indicates that linear regression is unreliable.

This recipe demonstrates how to build a correlation matrix.

STEP 1: Loading required library and dataset

Dataset description: It is the basic data about the customers going to the supermarket mall. The variable that we interested in is Annual.Income which is in 1000s , Spending Score and Age

# Data manipulation package library(dplyr) library(tidyverse) ​ # reading a dataset customer_seg = read.csv('Mall_Customers.csv') ​ # selecting the required variables using the select() function customer_seg_var = select(customer_seg, Age, Annual.Income..k..,Spending.Score..1.100.) ​ # summary of the selected variables glimpse(customer_seg_var)

Observations: 200
Variables: 3
$ Age                     19, 21, 20, 23, 31, 22, 35, 23, 64, 30, 67, 35…
$ Annual.Income..k..      15, 15, 16, 16, 17, 17, 18, 18, 19, 19, 19, 19…
$ Spending.Score..1.100.  39, 81, 6, 77, 40, 76, 6, 94, 3, 72, 14, 99, 1…

STEP 2: Building a correlation matrix

We use cor() function to create a correlation matrix.

Syntax: corr(x, method = )


  1. x = dataframe as input
  2. method = An arguement which provides us to input a method of calculation in the form of vector. The default is Pearson's
customer_seg_var.cor = cor(customer_seg_var) customer_seg_var.cor
	Age	Annual.Income..k..	Spending.Score..1.100.
Age	1.00000000 	-0.012398043	-0.327226846
Annual.Income..k..	-0.01239804 	1.000000000	0.009902848
Spending.Score..1.100.	-0.32722685 	0.009902848	1.000000000

Note: The Diagonal elements in the matrix is 1.0 as this is the correlation coefficient of the same variable and the coefficients ranges of -1 to 1

Relevant Projects

Natural language processing Chatbot application using NLTK for text classification
In this NLP AI application, we build the core conversational engine for a chatbot. We use the popular NLTK text classification library to achieve this.

Predict Credit Default | Give Me Some Credit Kaggle
In this data science project, you will predict borrowers chance of defaulting on credit loans by building a credit score prediction model.

Perform Time series modelling using Facebook Prophet
In this project, we are going to talk about Time Series Forecasting to predict the electricity requirement for a particular house using Prophet.

Credit Card Fraud Detection as a Classification Problem
In this data science project, we will predict the credit card fraud in the transactional dataset using some of the predictive models.

Ecommerce product reviews - Pairwise ranking and sentiment analysis
This project analyzes a dataset containing ecommerce product reviews. The goal is to use machine learning models to perform sentiment analysis on product reviews and rank them based on relevance. Reviews play a key role in product recommendation systems.

Build a Collaborative Filtering Recommender System in Python
Use the Amazon Reviews/Ratings dataset of 2 Million records to build a recommender system using memory-based collaborative filtering in Python.

Machine Learning project for Retail Price Optimization
In this machine learning pricing project, we implement a retail price optimization algorithm using regression trees. This is one of the first steps to building a dynamic pricing model.

Zillow’s Home Value Prediction (Zestimate)
Data Science Project in R -Build a machine learning algorithm to predict the future sale prices of homes.

Time Series Forecasting with LSTM Neural Network Python
Deep Learning Project- Learn to apply deep learning paradigm to forecast univariate time series data.

Predict Churn for a Telecom company using Logistic Regression
Machine Learning Project in R- Predict the customer churn of telecom sector and find out the key drivers that lead to churn. Learn how the logistic regression model using R can be used to identify the customer churn in telecom dataset.