How to plot violin plot in R?

How to plot violin plot in R?

How to plot violin plot in R?

This recipe helps you plot violin plot in R


Recipe Objective

Violin plots are similar to boxplots which showcases the probability density along with interquartile, median and range at different values. They are more informative than boxplots which are used to showcase the full distribution of the data. They are also known to combine the features of histogram and boxplots. They are mainly used to compare the distribution of different variables/columns in the dataset. ​

In this recipe we are going to use ggplot2 package to plot the required violin plot. ggplot2 package is based on the book Grammar of Graphics by Wilkinson. This package provides flexibility while incorporating different themes and plot specification with a high level of abstraction. The package mainly uses aesthetic mapping and geometric objects as arguments. Different types of geometric objects include: ​

  1. geom_point() - for plotting points
  2. geom_bar() - for plotting bar graph
  3. geom_line() - for plotting line chart
  4. geom_violin() - for plotting violin plot

The basic syntax of gggplot2 plots is: ​

ggplot(data, mapping = aes(x =, y=)) + geometric object ​

where: ​

  1. data : Dataframe that is used to plot the chart
  2. mapping = aes() : aesthetic mapping which deals with controlling axis (x and y indicates the different variables)
  3. geometric object : Indicates the code for typeof plot you need to visualise.

This recipe demonstrates how to make a violin plot using ggplot2.

STEP 1: Loading required library and dataset

Dataset description: It is the basic data about the customers going to the supermarket mall. The variable that we interested in is Annual.Income which is in 1000s.

# Data manipulation package library(tidyverse) ​ # ggplot for data visualisation library(ggplot2) ​ # reading a dataset customer_seg = read.csv('Mall_Customers.csv') ​ glimpse(customer_seg)
Rows: 200
Columns: 5
$ CustomerID              1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 1...
$ Gender                  Male, Male, Female, Female, Female, Female, ...
$ Age                     19, 21, 20, 23, 31, 22, 35, 23, 64, 30, 67, ...
$ Annual.Income..k..      15, 15, 16, 16, 17, 17, 18, 18, 19, 19, 19, ...
$ Spending.Score..1.100.  39, 81, 6, 77, 40, 76, 6, 94, 3, 72, 14, 99,...

STEP 2: Plotting a violin plot using ggplot

We use geometric object as geom_violin() to plot a violin plot of Annual Income variable based on the gender


  1. The + sign in the syntax earlier makes the code more readable and enables R to read further code without breaking it.
  2. fill arguement inside the geom_violin() o fill the violinplot based on a factor
  3. We also use labs() function to give a title to the graph
ggplot(customer_seg, aes(x = Gender, y = Annual.Income..k..)) + geom_violin(aes(fill = Gender)) + labs(title = "Annual Income Violin Plot")

Relevant Projects

Build a Collaborative Filtering Recommender System in Python
Use the Amazon Reviews/Ratings dataset of 2 Million records to build a recommender system using memory-based collaborative filtering in Python.

Time Series Forecasting with LSTM Neural Network Python
Deep Learning Project- Learn to apply deep learning paradigm to forecast univariate time series data.

Customer Churn Prediction Analysis using Ensemble Techniques
In this machine learning churn project, we implement a churn prediction model in python using ensemble techniques.

Predict Churn for a Telecom company using Logistic Regression
Machine Learning Project in R- Predict the customer churn of telecom sector and find out the key drivers that lead to churn. Learn how the logistic regression model using R can be used to identify the customer churn in telecom dataset.

Forecast Inventory demand using historical sales data in R
In this machine learning project, you will develop a machine learning model to accurately forecast inventory demand based on historical sales data.

Predict Employee Computer Access Needs in Python
Data Science Project in Python- Given his or her job role, predict employee access needs using amazon employee database.

Build a Similar Images Finder with Python, Keras, and Tensorflow
Build your own image similarity application using Python to search and find images of products that are similar to any given product. You will implement the K-Nearest Neighbor algorithm to find products with maximum similarity.

Census Income Data Set Project - Predict Adult Census Income
Use the Adult Income dataset to predict whether income exceeds 50K yr based on census data.

Walmart Sales Forecasting Data Science Project
Data Science Project in R-Predict the sales for each department using historical markdown data from the Walmart dataset containing data of 45 Walmart stores.

Build an Image Classifier for Plant Species Identification
In this machine learning project, we will use binary leaf images and extracted features, including shape, margin, and texture to accurately identify plant species using different benchmark classification techniques.