How to create a violin plot using plotly in R?

This recipe helps you create a violin plot using plotly in R

Recipe Objective

Violin plots are similar to boxplots which showcases the probability density along with interquartile, median and range at different values. They are more informative than boxplots which are used to showcase the full distribution of the data. They are also known to combine the features of histogram and boxplots. They are mainly used to compare the distribution of different variables/columns in the dataset. ​

In this recipe we are going to use Plotly package to plot the required violin plot. Plotly package provides an interface to the plotly javascript library allowing us to create interactive web-based graphics entrirely in R. Plots created by plotly works in multiple format such as: ​

  1. R Markdown Documents
  2. Shiny apps - deploying on the web
  3. Windows viewer

Plotly has been actively developed and supported by it's community. ​

This recipe demonstrates how to plot a violin plot in R using plotly package. ​

STEP 1: Loading required library and dataset

Dataset description: It is the basic data about the customers going to the supermarket mall. The variable that we are interested in: Annual.Income (which is in 1000s), Spending Score and age

# Data manipulation package library(dplyr) library(tidyverse) # reading a dataset customer_seg = read.csv('R_132_Mall_Customers.csv') # selecting the required variables using the select() function customer_seg_var = select(customer_seg, Age, Annual.Income..k..,Spending.Score..1.100.) # summary of the selected variables glimpse(customer_seg_var)
Observations: 200
Variables: 3
$ Age                     19, 21, 20, 23, 31, 22, 35, 23, 64, 30, 67, 35…
$ Annual.Income..k..      15, 15, 16, 16, 17, 17, 18, 18, 19, 19, 19, 19…
$ Spending.Score..1.100.  39, 81, 6, 77, 40, 76, 6, 94, 3, 72, 14, 99, 1…

STEP 2: Plotting a Violin plot using Plotly

We use the plot_ly() function to plot a box plot between of Annual Income based on Gender.

Syntax: plot_ly( data = , x = , y = , type = "violin")

Where:

  1. x = variable to be plotted in x axis
  2. y = variable to be plotted in y axis
  3. data = dataframe to be used
  4. type = type of the chart

Note:

  1. The %>% sign in the syntax earlier makes the code more readable and enables R to read further code without breaking it.
  2. We also use layout() function to give a title to the graph
fig <- plot_ly(x = ~Gender, y = ~Annual.Income..k.., data = customer_seg, type = "violin", # to also plot box plot whiskers on top of the violin plot box = list(visible = T)) %>% layout(title = 'Violin Plot using Plotly') embed_notebook(fig)

What Users are saying..

profile image

Ed Godalle

Director Data Analytics at EY / EY Tech
linkedin profile url

I am the Director of Data Analytics with over 10+ years of IT experience. I have a background in SQL, Python, and Big Data working with Accenture, IBM, and Infosys. I am looking to enhance my skills... Read More

Relevant Projects

Word2Vec and FastText Word Embedding with Gensim in Python
In this NLP Project, you will learn how to use the popular topic modelling library Gensim for implementing two state-of-the-art word embedding methods Word2Vec and FastText models.

End-to-End ML Model Monitoring using Airflow and Docker
In this MLOps Project, you will learn to build an end to end pipeline to monitor any changes in the predictive power of model or degradation of data.

Recommender System Machine Learning Project for Beginners-4
Collaborative Filtering Recommender System Project - Comparison of different model based and memory based methods to build recommendation system using collaborative filtering.

Build Deep Autoencoders Model for Anomaly Detection in Python
In this deep learning project , you will build and deploy a deep autoencoders model using Flask.

Isolation Forest Model and LOF for Anomaly Detection in Python
Credit Card Fraud Detection Project - Build an Isolation Forest Model and Local Outlier Factor (LOF) in Python to identify fraudulent credit card transactions.

Machine Learning Project to Forecast Rossmann Store Sales
In this machine learning project you will work on creating a robust prediction model of Rossmann's daily sales using store, promotion, and competitor data.

Build Piecewise and Spline Regression Models in Python
In this Regression Project, you will learn how to build a piecewise and spline regression model from scratch in Python to predict the points scored by a sports team.

End-to-End Snowflake Healthcare Analytics Project on AWS-1
In this Snowflake Healthcare Analytics Project, you will leverage Snowflake on AWS to predict patient length of stay (LOS) in hospitals. The prediction of LOS can help in efficient resource allocation, lower the risk of staff/visitor infections, and improve overall hospital functioning.

PyTorch Project to Build a GAN Model on MNIST Dataset
In this deep learning project, you will learn how to build a GAN Model on MNIST Dataset for generating new images of handwritten digits.

MLOps Project to Deploy Resume Parser Model on Paperspace
In this MLOps project, you will learn how to deploy a Resume Parser Streamlit Application on Paperspace Private Cloud.