Violin plots are similar to boxplots which showcases the probability density along with interquartile, median and range at different values. They are more informative than boxplots which are used to showcase the full distribution of the data. They are also known to combine the features of histogram and boxplots. They are mainly used to compare the distribution of different variables/columns in the dataset.
Plotly has been actively developed and supported by it's community.
This recipe demonstrates how to plot a violin plot in R using plotly package.
Dataset description: It is the basic data about the customers going to the supermarket mall. The variable that we are interested in: Annual.Income (which is in 1000s), Spending Score and age
# Data manipulation package library(dplyr) library(tidyverse) # reading a dataset customer_seg = read.csv('R_132_Mall_Customers.csv') # selecting the required variables using the select() function customer_seg_var = select(customer_seg, Age, Annual.Income..k..,Spending.Score..1.100.) # summary of the selected variables glimpse(customer_seg_var)
Observations: 200 Variables: 3 $ Age
19, 21, 20, 23, 31, 22, 35, 23, 64, 30, 67, 35… $ Annual.Income..k.. 15, 15, 16, 16, 17, 17, 18, 18, 19, 19, 19, 19… $ Spending.Score..1.100. 39, 81, 6, 77, 40, 76, 6, 94, 3, 72, 14, 99, 1…
We use the plot_ly() function to plot a box plot between of Annual Income based on Gender.
Syntax: plot_ly( data = , x = , y = , type = "violin")
fig <- plot_ly(x = ~Gender, y = ~Annual.Income..k.., data = customer_seg, type = "violin", # to also plot box plot whiskers on top of the violin plot box = list(visible = T)) %>% layout(title = 'Violin Plot using Plotly') embed_notebook(fig)