How to plot boxplot in R?

How to plot boxplot in R?

How to plot boxplot in R?

This recipe helps you plot boxplot in R


Recipe Objective

Box-plots are also known as box-whisker plots. They are a special type of plot which showcases complex numerical data in a compact manner. It is more informative than a strip chart as it also displays the interquartile range, median and min-max range. It is mainly used to find outliers in the dataset. ​

In this recipe we are going to use ggplot2 package to plot the required box plot. ggplot2 package is based on the book Grammar of Graphics by Wilkinson. This package provides flexibility while incorporating different themes and plot specification with a high level of abstraction. The package mainly uses aesthetic mapping and geometric objects as arguments. Different types of geometric objects include: ​

  1. geom_point() - for plotting points
  2. geom_bar() - for plotting bar graph
  3. geom_line() - for plotting line chart
  4. geom_boxplot() - for plotting boxplot

The basic syntax of gggplot2 plots is: ​

ggplot(data, mapping = aes(x =, y=)) + geometric object ​

where: ​

  1. data : Dataframe that is used to plot the chart
  2. mapping = aes() : aesthetic mapping which deals with controlling axis (x and y indicates the different variables)
  3. geometric object : Indicates the code for typeof plot you need to visualise.

This recipe demonstrates how to make a violin plot using ggplot2.

STEP 1: Loading required library and dataset

Dataset description: It is the basic data about the customers going to the supermarket mall. The variable that we interested in is Annual.Income which is in 1000s

# Data manipulation package library(tidyverse) ​ # ggplot for data visualisation library(ggplot2) ​ # reading a dataset customer_seg = read.csv('Mall_Customers.csv') ​ glimpse(customer_seg)
Rows: 200
Columns: 5
$ CustomerID              1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 1...
$ Gender                  Male, Male, Female, Female, Female, Female, ...
$ Age                     19, 21, 20, 23, 31, 22, 35, 23, 64, 30, 67, ...
$ Annual.Income..k..      15, 15, 16, 16, 17, 17, 18, 18, 19, 19, 19, ...
$ Spending.Score..1.100.  39, 81, 6, 77, 40, 76, 6, 94, 3, 72, 14, 99,...

STEP 2: Plotting a Box plot using ggplot

We use geometric object as geom_boxplot() to plot a boxplot of Annual Income variable based on the gender


  1. The + sign in the syntax earlier makes the code more readable and enables R to read further code without breaking it.
  2. fill arguement inside the geom_violin() o fill the violinplot based on a factor
  3. We also use labs() function to give a title to the graph
ggplot(customer_seg, aes(x = Gender, y = Annual.Income..k..)) + geom_boxplot(aes(fill = Gender)) + labs(title = "Annual Income Box Plot")

Relevant Projects

Loan Eligibility Prediction using Gradient Boosting Classifier
This data science in python project predicts if a loan should be given to an applicant or not. We predict if the customer is eligible for loan based on several factors like credit score and past history.

Predict Churn for a Telecom company using Logistic Regression
Machine Learning Project in R- Predict the customer churn of telecom sector and find out the key drivers that lead to churn. Learn how the logistic regression model using R can be used to identify the customer churn in telecom dataset.

Topic modelling using Kmeans clustering to group customer reviews
In this Kmeans clustering machine learning project, you will perform topic modelling in order to group customer reviews based on recurring patterns.

Machine Learning project for Retail Price Optimization
In this machine learning pricing project, we implement a retail price optimization algorithm using regression trees. This is one of the first steps to building a dynamic pricing model.

Build a Music Recommendation Algorithm using KKBox's Dataset
Music Recommendation Project using Machine Learning - Use the KKBox dataset to predict the chances of a user listening to a song again after their very first noticeable listening event.

Mercari Price Suggestion Challenge Data Science Project
Data Science Project in Python- Build a machine learning algorithm that automatically suggests the right product prices.

Build an Image Classifier for Plant Species Identification
In this machine learning project, we will use binary leaf images and extracted features, including shape, margin, and texture to accurately identify plant species using different benchmark classification techniques.

Data Science Project-TalkingData AdTracking Fraud Detection
Machine Learning Project in R-Detect fraudulent click traffic for mobile app ads using R data science programming language.

Machine Learning or Predictive Models in IoT - Energy Prediction Use Case
In this machine learning and IoT project, we are going to test out the experimental data using various predictive models and train the models and break the energy usage.

Credit Card Fraud Detection as a Classification Problem
In this data science project, we will predict the credit card fraud in the transactional dataset using some of the predictive models.