How to create a heatmap in R?

How to create a heatmap in R?

How to create a heatmap in R?

This recipe helps you create a heatmap in R


Recipe Objective

A correlation matrix is a "square" table which consists of correlation coefficients for a set of variables. They are mainly used to determine relationships between the variables.

There are three main applications of correlation matrix:

  1. To explore patterns in a large dataset by summarising it in a form of a table.
  2. Used as an input for exploratory data analysis, structural equation models and confirmatory factor analysis.
  3. Used as a diagnostic step for checking different analysis. For example, a high correlation coefficients indicates that linear regression is unreliable.

The most commonly used visualisation technique to showcase the correlation matrix is heatmap. This technique showcases the magnitude as shades of colors.

This recipe demonstrates how to build a heatmap of a correlation matrix.

STEP 1: Loading required library and dataset

Dataset description: It is the basic data about the customers going to the supermarket mall. The variable that we interested in is Annual.Income which is in 1000s , Spending Score and Age

# Data manipulation package library(dplyr) library(tidyverse) ​ # reading a dataset customer_seg = read.csv('Mall_Customers.csv') ​ # selecting the required variables using the select() function customer_seg_var = select(customer_seg, Age, Annual.Income..k..,Spending.Score..1.100.) ​ # summary of the selected variables glimpse(customer_seg_var)

Observations: 200
Variables: 3
$ Age                     19, 21, 20, 23, 31, 22, 35, 23, 64, 30, 67, 35…
$ Annual.Income..k..      15, 15, 16, 16, 17, 17, 18, 18, 19, 19, 19, 19…
$ Spending.Score..1.100.  39, 81, 6, 77, 40, 76, 6, 94, 3, 72, 14, 99, 1…

STEP 2: Building a correlation matrix

We use cor() function to create a correlation matrix.

Syntax: corr(x, method = )


  1. x = dataframe as input
  2. method = An arguement which provides us to input a method of calculation in the form of vector. The default is Pearson's
customer_seg_var.cor = cor(customer_seg_var) customer_seg_var.cor
	Age	Annual.Income..k..	Spending.Score..1.100.
Age	1.00000000 	-0.012398043	-0.327226846
Annual.Income..k..	-0.01239804 	1.000000000	0.009902848
Spending.Score..1.100.	-0.32722685 	0.009902848	1.000000000

Note: The Diagonal elements in the matrix is 1.0 as this is the correlation coefficient of the same variable and the coefficients ranges of -1 to 1

STEP 3: Building a heatmap of correlation matrix

We use the heatmap() function in R to carry out this task.

Syntax: heatmap(x, col = , symm = )

  1. x = matrix
  2. col = vector which indicates colors to be used to showcase the magnitude of correlation coefficients.
  3. symm = If True, the heat map is symmetrical
# we have used the default colour scheme heatmap(customer_seg_var.cor, symm = TRUE)

Relevant Projects

German Credit Dataset Analysis to Classify Loan Applications
In this data science project, you will work with German credit dataset using classification techniques like Decision Tree, Neural Networks etc to classify loan applications using R.

Data Science Project in Python on BigMart Sales Prediction
The goal of this data science project is to build a predictive model and find out the sales of each product at a given Big Mart store.

Walmart Sales Forecasting Data Science Project
Data Science Project in R-Predict the sales for each department using historical markdown data from the Walmart dataset containing data of 45 Walmart stores.

Mercari Price Suggestion Challenge Data Science Project
Data Science Project in Python- Build a machine learning algorithm that automatically suggests the right product prices.

Loan Eligibility Prediction in Python using
In this loan prediction project you will build predictive models in Python using to predict if an applicant is able to repay the loan or not.

Machine Learning project for Retail Price Optimization
In this machine learning pricing project, we implement a retail price optimization algorithm using regression trees. This is one of the first steps to building a dynamic pricing model.

Forecast Inventory demand using historical sales data in R
In this machine learning project, you will develop a machine learning model to accurately forecast inventory demand based on historical sales data.

Learn to prepare data for your next machine learning project
Text data requires special preparation before you can start using it for any machine learning project.In this ML project, you will learn about applying Machine Learning models to create classifiers and learn how to make sense of textual data.

Build a Similar Images Finder with Python, Keras, and Tensorflow
Build your own image similarity application using Python to search and find images of products that are similar to any given product. You will implement the K-Nearest Neighbor algorithm to find products with maximum similarity.

Perform Time series modelling using Facebook Prophet
In this project, we are going to talk about Time Series Forecasting to predict the electricity requirement for a particular house using Prophet.