How to build a correlation matrix in R?

This recipe helps you build a correlation matrix in R

Recipe Objective

A correlation matrix is a "square" table which consists of correlation coefficients for a set of variables. They are mainly used to determine relationships between the variables.

There are three main applications of correlation matrix:

  1. To explore patterns in a large dataset by summarising it in a form of a table.
  2. Used as an input for exploratory data analysis, structural equation models and confirmatory factor analysis.
  3. Used as a diagnostic step for checking different analysis. For example, a high correlation coefficients indicates that linear regression is unreliable.

This recipe demonstrates how to build a correlation matrix.

STEP 1: Loading required library and dataset

Dataset description: It is the basic data about the customers going to the supermarket mall. The variable that we interested in is Annual.Income which is in 1000s , Spending Score and Age

# Data manipulation package library(dplyr) library(tidyverse) ​ # reading a dataset customer_seg = read.csv('Mall_Customers.csv') ​ # selecting the required variables using the select() function customer_seg_var = select(customer_seg, Age, Annual.Income..k..,Spending.Score..1.100.) ​ # summary of the selected variables glimpse(customer_seg_var)

Observations: 200
Variables: 3
$ Age                     19, 21, 20, 23, 31, 22, 35, 23, 64, 30, 67, 35…
$ Annual.Income..k..      15, 15, 16, 16, 17, 17, 18, 18, 19, 19, 19, 19…
$ Spending.Score..1.100.  39, 81, 6, 77, 40, 76, 6, 94, 3, 72, 14, 99, 1…

STEP 2: Building a correlation matrix

We use cor() function to create a correlation matrix.

Syntax: corr(x, method = )

where:

  1. x = dataframe as input
  2. method = An arguement which provides us to input a method of calculation in the form of vector. The default is Pearson's

customer_seg_var.cor = cor(customer_seg_var) customer_seg_var.cor

	Age	Annual.Income..k..	Spending.Score..1.100.
Age	1.00000000 	-0.012398043	-0.327226846
Annual.Income..k..	-0.01239804 	1.000000000	0.009902848
Spending.Score..1.100.	-0.32722685 	0.009902848	1.000000000

Note: The Diagonal elements in the matrix is 1.0 as this is the correlation coefficient of the same variable and the coefficients ranges of -1 to 1

What Users are saying..

profile image

Ameeruddin Mohammed

ETL (Abintio) developer at IBM
linkedin profile url

I come from a background in Marketing and Analytics and when I developed an interest in Machine Learning algorithms, I did multiple in-class courses from reputed institutions though I got good... Read More

Relevant Projects

Recommender System Machine Learning Project for Beginners-1
Recommender System Machine Learning Project for Beginners - Learn how to design, implement and train a rule-based recommender system in Python

Time Series Python Project using Greykite and Neural Prophet
In this time series project, you will forecast Walmart sales over time using the powerful, fast, and flexible time series forecasting library Greykite that helps automate time series problems.

Build Piecewise and Spline Regression Models in Python
In this Regression Project, you will learn how to build a piecewise and spline regression model from scratch in Python to predict the points scored by a sports team.

Build Portfolio Optimization Machine Learning Models in R
Machine Learning Project for Financial Risk Modelling and Portfolio Optimization with R- Build a machine learning model in R to develop a strategy for building a portfolio for maximized returns.

PyCaret Project to Build and Deploy an ML App using Streamlit
In this PyCaret Project, you will build a customer segmentation model with PyCaret and deploy the machine learning application using Streamlit.

A/B Testing Approach for Comparing Performance of ML Models
The objective of this project is to compare the performance of BERT and DistilBERT models for building an efficient Question and Answering system. Using A/B testing approach, we explore the effectiveness and efficiency of both models and determine which one is better suited for Q&A tasks.

Build a Autoregressive and Moving Average Time Series Model
In this time series project, you will learn to build Autoregressive and Moving Average Time Series Models to forecast future readings, optimize performance, and harness the power of predictive analytics for sensor data.

Isolation Forest Model and LOF for Anomaly Detection in Python
Credit Card Fraud Detection Project - Build an Isolation Forest Model and Local Outlier Factor (LOF) in Python to identify fraudulent credit card transactions.

NLP Project on LDA Topic Modelling Python using RACE Dataset
Use the RACE dataset to extract a dominant topic from each document and perform LDA topic modeling in python.

LLM Project to Build and Fine Tune a Large Language Model
In this LLM project for beginners, you will learn to build a knowledge-grounded chatbot using LLM's and learn how to fine tune it.