How to create a marginal plot with ggplot2 in R

This recipe helps you create a marginal plot with ggplot2 in R

Recipe Objective

A marginal plot is an extension of scatter plot which not only shows the relationship between two variables but also shows the individual distributions such as Histogram, Box plots of the variables. ​

In this recipe we are going to use ggplot2 as well as ggExtra package to plot the required Marginal plot. ggplot2 package is based on the book Grammar of Graphics by Wilkinson. This package provides flexibility while incorporating different themes and plot specification with a high level of abstraction. The package mainly uses aesthetic mapping and geometric objects as arguments. Different types of geometric objects include: ​

  1. geom_point() - for plotting points
  2. geom_bar() - for plotting bar graph
  3. geom_line() - for plotting line chart
  4. geom_histogram() - for plotting histogram

The basic syntax of gggplot2 plots is: ​

ggplot(data, mapping = aes(x =, y=)) + geometric object ​

where: ​

  1. data : Dataframe that is used to plot the chart
  2. mapping = aes() : aesthetic mapping which deals with controlling axis (x and y indicates the different variables)
  3. geometric object : Indicates the code for typeof plot you need to visualise.

ggExtra Package is mainly used to further enhance the built-in features of "ggplot2" by providing us with various functions to create Marginal plots especially Histogram and Boxplot.

This recipe demonstrates how to make a Marginal Plots using ggplot2 and ggExtra.

STEP 1: Loading required library and dataset

We will take an example of normal distribution along x and y axis and showcase the individual histogram and boxplot of the same.

# ggplot for data visualisation library(ggplot2) # installing ggExtra for marginal plots using devtools install.packages("ggExtra") library(ggExtra) # Creating a dataframe of normally distributed 1000 points with mean = 25 and std.dev = 5 norm_dist = data.frame(x = rnorm(1000, 25, 5), y = rnorm(1000, 25, 5)) norm_dist
x	y
20.08595	27.34024
24.57561	33.94111
31.76342	33.02787
31.81128	15.18382
23.61155	31.49135
...	...
24.98564	27.50006
25.33502	32.35901
26.72924	29.06489
23.44376	22.31388
20.21346	19.30083

STEP 2: Plotting a scatter plot using ggplot

We use geometric object as geom_point() to plot a scatter plot (x vs y)

Note:

  1. The + sign in the syntax earlier makes the code more readable and enables R to read further code without breaking it.
  2. We also use labs() function to give a title to the graph
scat_plot = ggplot(norm_dist, mapping = aes(x, y )) + geom_point() + labs(title = "X vs Y") scat_plot

STEP 3: Marginal Plot- Histogram and Boxplot

We use "ggExtra::ggMarginal()" function to plot a marginal plot.

syntax: ggExtra::ggMarginal(x , type = )

where:

  1. x = plot on which the marginal plot is created
  2. type = type of marginal plot
# Marginal Histogram plot ggExtra::ggMarginal(scat_plot, type="histogram") # Marginal Boxplot ggExtra::ggMarginal(scat_plot, type="boxplot")

What Users are saying..

profile image

Ed Godalle

Director Data Analytics at EY / EY Tech
linkedin profile url

I am the Director of Data Analytics with over 10+ years of IT experience. I have a background in SQL, Python, and Big Data working with Accenture, IBM, and Infosys. I am looking to enhance my skills... Read More

Relevant Projects

Build an AI Chatbot from Scratch using Keras Sequential Model
In this NLP Project, you will learn how to build an AI Chatbot from Scratch using Keras Sequential Model.

Linear Regression Model Project in Python for Beginners Part 2
Machine Learning Linear Regression Project for Beginners in Python to Build a Multiple Linear Regression Model on Soccer Player Dataset.

Topic modelling using Kmeans clustering to group customer reviews
In this Kmeans clustering machine learning project, you will perform topic modelling in order to group customer reviews based on recurring patterns.

Hands-On Approach to Regression Discontinuity Design Python
In this machine learning project, you will learn to implement Regression Discontinuity Design Example in Python to determine the effect of age on Mortality Rate in Python.

Build a Multi-Class Classification Model in Python on Saturn Cloud
In this machine learning classification project, you will build a multi-class classification model in Python on Saturn Cloud to predict the license status of a business.

BERT Text Classification using DistilBERT and ALBERT Models
This Project Explains how to perform Text Classification using ALBERT and DistilBERT

CycleGAN Implementation for Image-To-Image Translation
In this GAN Deep Learning Project, you will learn how to build an image to image translation model in PyTorch with Cycle GAN.

Build Real Estate Price Prediction Model with NLP and FastAPI
In this Real Estate Price Prediction Project, you will learn to build a real estate price prediction machine learning model and deploy it on Heroku using FastAPI Framework.

Build Regression Models in Python for House Price Prediction
In this Machine Learning Regression project, you will build and evaluate various regression models in Python for house price prediction.

Learn to Build an End-to-End Machine Learning Pipeline - Part 1
In this Machine Learning Project, you will learn how to build an end-to-end machine learning pipeline for predicting truck delays, addressing a major challenge in the logistics industry.