How to create a boxplot using plotly in R?

How to create a boxplot using plotly in R?

How to create a boxplot using plotly in R?

This recipe helps you create a boxplot using plotly in R


Recipe Objective

Box-plots are also known as box-whisker plots. They are a special type of plot which showcases complex numerical data in a compact manner. It is more informative than a strip chart as it also displays the interquartile range, median and min-max range. It is mainly used to find outliers in the dataset. ​

In this recipe we are going to use Plotly package to plot the required box plot. Plotly package provides an interface to the plotly javascript library allowing us to create interactive web-based graphics entrirely in R. Plots created by plotly works in multiple format such as: ​

  1. R Markdown Documents
  2. Shiny apps - deploying on the web
  3. Windows viewer

Plotly has been actively developed and supported by it's community. ​

This recipe demonstrates how to plot a box plot in R using plotly package. ​

STEP 1: Loading required library and dataset

Dataset description: It is the basic data about the customers going to the supermarket mall. The variables that we are interested in: Annual.Income (which is in 1000s) , Spending Score and Age

# Data manipulation package library(dplyr) library(tidyverse) # reading a dataset customer_seg = read.csv('R_131_Mall_Customers.csv') # selecting the required variables using the select() function customer_seg_var = select(customer_seg, Age, Annual.Income..k..,Spending.Score..1.100.) # summary of the selected variables glimpse(customer_seg_var)
Observations: 200
Variables: 3
$ Age                     19, 21, 20, 23, 31, 22, 35, 23, 64, 30, 67, 35…
$ Annual.Income..k..      15, 15, 16, 16, 17, 17, 18, 18, 19, 19, 19, 19…
$ Spending.Score..1.100.  39, 81, 6, 77, 40, 76, 6, 94, 3, 72, 14, 99, 1…

STEP 2: Plotting a box plot using Plotly

We use the plot_ly() function to plot a box plot between of Annual Income based on Gender.

Syntax: plot_ly( data = , x = , y = , type = "box")


  1. x = variable to be plotted in x axis
  2. y = variable to be plotted in y axis
  3. data = dataframe to be used
  4. type = type of the chart


  1. The %>% sign in the syntax earlier makes the code more readable and enables R to read further code without breaking it.
  2. We also use layout() function to give a title to the graph
fig <- plot_ly(x = ~Gender, y = ~Annual.Income..k.., data = customer_seg, type = "box") %>% layout(title = 'Box Plot using Plotly') embed_notebook(fig)

Relevant Projects

Census Income Data Set Project - Predict Adult Census Income
Use the Adult Income dataset to predict whether income exceeds 50K yr based on census data.

Mercari Price Suggestion Challenge Data Science Project
Data Science Project in Python- Build a machine learning algorithm that automatically suggests the right product prices.

Loan Eligibility Prediction using Gradient Boosting Classifier
This data science in python project predicts if a loan should be given to an applicant or not. We predict if the customer is eligible for loan based on several factors like credit score and past history.

Expedia Hotel Recommendations Data Science Project
In this data science project, you will contextualize customer data and predict the likelihood a customer will stay at 100 different hotel groups.

Choosing the right Time Series Forecasting Methods
There are different time series forecasting methods to forecast stock price, demand etc. In this machine learning project, you will learn to determine which forecasting method to be used when and how to apply with time series forecasting example.

Build a Music Recommendation Algorithm using KKBox's Dataset
Music Recommendation Project using Machine Learning - Use the KKBox dataset to predict the chances of a user listening to a song again after their very first noticeable listening event.

Customer Churn Prediction Analysis using Ensemble Techniques
In this machine learning churn project, we implement a churn prediction model in python using ensemble techniques.

Topic modelling using Kmeans clustering to group customer reviews
In this Kmeans clustering machine learning project, you will perform topic modelling in order to group customer reviews based on recurring patterns.

Time Series Forecasting with LSTM Neural Network Python
Deep Learning Project- Learn to apply deep learning paradigm to forecast univariate time series data.

German Credit Dataset Analysis to Classify Loan Applications
In this data science project, you will work with German credit dataset using classification techniques like Decision Tree, Neural Networks etc to classify loan applications using R.