How to create a marginal plot with ggplot2?
MACHINE LEARNING RECIPES DATA CLEANING PYTHON DATA MUNGING PANDAS CHEATSHEET     ALL TAGS

How to create a marginal plot with ggplot2?

How to create a marginal plot with ggplot2?

This recipe helps you create a marginal plot with ggplot2

0

Recipe Objective

A marginal plot is an extension of scatter plot which not only shows the relationship between two variables but also shows the individual distributions such as Histogram, Box plots of the variables. ​

In this recipe we are going to use ggplot2 as well as ggExtra package to plot the required Marginal plot. ggplot2 package is based on the book Grammar of Graphics by Wilkinson. This package provides flexibility while incorporating different themes and plot specification with a high level of abstraction. The package mainly uses aesthetic mapping and geometric objects as arguments. Different types of geometric objects include: ​

  1. geom_point() - for plotting points
  2. geom_bar() - for plotting bar graph
  3. geom_line() - for plotting line chart
  4. geom_histogram() - for plotting histogram

The basic syntax of gggplot2 plots is: ​

ggplot(data, mapping = aes(x =, y=)) + geometric object ​

where: ​

  1. data : Dataframe that is used to plot the chart
  2. mapping = aes() : aesthetic mapping which deals with controlling axis (x and y indicates the different variables)
  3. geometric object : Indicates the code for typeof plot you need to visualise.

ggExtra Package is mainly used to further enhance the built-in features of "ggplot2" by providing us with various functions to create Marginal plots especially Histogram and Boxplot.

This recipe demonstrates how to make a Marginal Plots using ggplot2 and ggExtra.

STEP 1: Loading required library and dataset

We will take an example of normal distribution along x and y axis and showcase the individual histogram and boxplot of the same.

# ggplot for data visualisation library(ggplot2) # installing ggExtra for marginal plots using devtools install.packages("ggExtra") library(ggExtra) # Creating a dataframe of normally distributed 1000 points with mean = 25 and std.dev = 5 norm_dist = data.frame(x = rnorm(1000, 25, 5), y = rnorm(1000, 25, 5)) norm_dist
x	y
20.08595	27.34024
24.57561	33.94111
31.76342	33.02787
31.81128	15.18382
23.61155	31.49135
...	...
24.98564	27.50006
25.33502	32.35901
26.72924	29.06489
23.44376	22.31388
20.21346	19.30083

STEP 2: Plotting a scatter plot using ggplot

We use geometric object as geom_point() to plot a scatter plot (x vs y)

Note:

  1. The + sign in the syntax earlier makes the code more readable and enables R to read further code without breaking it.
  2. We also use labs() function to give a title to the graph
scat_plot = ggplot(norm_dist, mapping = aes(x, y )) + geom_point() + labs(title = "X vs Y") scat_plot

STEP 3: Marginal Plot- Histogram and Boxplot

We use "ggExtra::ggMarginal()" function to plot a marginal plot.

syntax: ggExtra::ggMarginal(x , type = )

where:

  1. x = plot on which the marginal plot is created
  2. type = type of marginal plot
# Marginal Histogram plot ggExtra::ggMarginal(scat_plot, type="histogram") # Marginal Boxplot ggExtra::ggMarginal(scat_plot, type="boxplot")

Relevant Projects

Loan Eligibility Prediction using Gradient Boosting Classifier
This data science in python project predicts if a loan should be given to an applicant or not. We predict if the customer is eligible for loan based on several factors like credit score and past history.

Data Science Project on Wine Quality Prediction in R
In this R data science project, we will explore wine dataset to assess red wine quality. The objective of this data science project is to explore which chemical properties will influence the quality of red wines.

Time Series Forecasting with LSTM Neural Network Python
Deep Learning Project- Learn to apply deep learning paradigm to forecast univariate time series data.

Data Science Project in Python on BigMart Sales Prediction
The goal of this data science project is to build a predictive model and find out the sales of each product at a given Big Mart store.

Ecommerce product reviews - Pairwise ranking and sentiment analysis
This project analyzes a dataset containing ecommerce product reviews. The goal is to use machine learning models to perform sentiment analysis on product reviews and rank them based on relevance. Reviews play a key role in product recommendation systems.

Data Science Project-TalkingData AdTracking Fraud Detection
Machine Learning Project in R-Detect fraudulent click traffic for mobile app ads using R data science programming language.

Human Activity Recognition Using Multiclass Classification in Python
In this human activity recognition project, we use multiclass classification machine learning techniques to analyse fitness dataset from a smartphone tracker.

Deep Learning with Keras in R to Predict Customer Churn
In this deep learning project, we will predict customer churn using Artificial Neural Networks and learn how to model an ANN in R with the keras deep learning package.

Machine Learning or Predictive Models in IoT - Energy Prediction Use Case
In this machine learning and IoT project, we are going to test out the experimental data using various predictive models and train the models and break the energy usage.

Predict Credit Default | Give Me Some Credit Kaggle
In this data science project, you will predict borrowers chance of defaulting on credit loans by building a credit score prediction model.