Data Visualisation with R using GGVIS
What will you learn from this R Tutorial for Data Visualisation using GGVIS package?
This R tutorial will help you learn the usage of the popular data visualisation package GGVIS so that you can create attractive visualisations to turn your data analysis results into sophisticated graphics.
Attractions of the GGVIS Tutorial
- Learn to customize graphics using ggvis.
- Learn how to add interactive gears like sliders, checkboxes and colors to your visualisations.
GGVIS: A Data Visualization package in R
Importance of data visualizations in data Science in not hidden from anyone. Right from visualizing raw data from the perspective of a data analyst to showcasing results to the consumer of analytics products requires the use of data visualisation tools
Earlier people used Microsoft PowerPoint and other data visualization software like QlikView, Tableau etc. to build reports or stories. However, now people have started using R for data visualization as well (Thanks to Shiny!) which is one of the powerful advantage of R programming language over its competitors like Python, SAS etc.
After the development of ggplot and Shiny there was a need to combine the best parts of Shiny, ggplot2, vega and dplyr.
Recently launched R data visualization package -GGVIS is an implementation which takes best part of ggplot2 reactive framework of Shiny and web graphic features from Vega.
GGVIS package for data visualization in R combines best of the statistical R power & availability of a web browser.
If you would like more information about Data Science careers, please click the orange "Request Info" button on top of this page.
Let’s get started on how to use GGVIS for Data Visualization with R
To start your learning journey on implementing data visualization techniques with GGVIS package– just like other packages in R, GGVIS has to be installed and loaded into current R Session.
We will be using R Studio for the demonstration of the visualisation package because R Studio works similar to a web browser.
GGVIS is an open source data visualization package available in CRAN library hence we can directly call install.packages() function which will search for ggvis data visualisation package in CRAN directory and install it for us.
While installing ggvis data visualisation R package, you will notice that it has also installed multiple other R packages (dependencies) which are required for a appropriate functioning of ggvis
We will be considering iris dataset available in R for illustrating the power of GGVIS data visualisation tool in R for creating wonderful visualisations. Iris data set can be loaded into R session by calling the function data(iris)
iris is a widely used dataset in R and contains the following 5 columns:
Following image shows a snapshot of the iris dataset-
Create your First plot using open source data visualization package in R- GGVIS
There are three basic elements to a GGVIS plot:
- Dataset: First we need to pass the name of the dataset, using which we want to build our plots.
- Variables: Variables or features which we want to use
- Plotting tools: There are multiple options available in GGVIS visualisation package to plot and format you charts
Similar to other data visualization packages, call to ggvis() starts with the dataset name and other arguments to map the objects you want to build.
first_plot <- ggvis(iris, x = ~Sepal.Length, y = ~Sepal.Width)
What? You ran the command and still it didn’t plot anything?
That’s true because we haven’t told GGVIS to display the plot yet, let’s do that using the layer_points() function.
Using the last code line, we told GGVIS that we want to build a scatter plot on the ggvis object. Apart from scatter plots there are other options available which can be used to build different types of visualisation, we will explore few of them in this GGVIS tutorial.
Alternatively, both the steps can be combined in as a single step using the following R code -
layer_points(ggvis(iris, x = ~Sepal.Length, y = ~Sepal.Width))
To make life more simpler, GGVIS() package uses pronounced pipe operator (%>%) to combine multiple calls. e.g.,
iris %>% ggvis( x = ~Sepal.Length, y = ~Sepal.Width) %>% layer_points()
Use of pronounced pipe operator gives you a lots of power to modify the output using dplyr data visualisation package also according to your requirements.
Let’s say if we want to change the scale of x-axis while plotting it, this objective can easily be fulfilled using the power of dplyr package in R-
require(dplyr) iris %>% ggvis( x = ~Sepal.Length, y = ~Sepal.Width) %>% mutate(Sepal.Length = Sepal.Length*10) %>% layer_points()
You can also change the axis titles of the chart using add_axis() layer on top of previously built chart.
require(dplyr) iris %>% ggvis( x = ~Sepal.Length, y = ~Sepal.Width) %>% mutate(Sepal.Length = Sepal.Length*10) %>% layer_points() %>% add_axis("x",title = "Length of Sepal") %>% add_axis("y",title = "Length of Petal")
GGVIS visualization package in R also provides the liberty to change the orientation of axes as per your choice-
require(dplyr) iris %>% ggvis( x = ~Sepal.Length, y = ~Sepal.Width) %>% mutate(Sepal.Length = Sepal.Length*10) %>% layer_points() %>% add_axis("x",title = "Length of Sepal", orient = “top”) %>% add_axis("y",title = "Length of Petal", orient = “right”)
As GGVIS is still in the development phase, there is no direct method to add title to your plot but you can add a title using other alternative methods.
We can also add other variables to the plot by mapping them to one of the visual properties e.g., stroke, fill, shape and size.
Let’s try it out and add Petal.Legth to fill color to the scatter points.
iris %>% ggvis( x = ~Sepal.Length, y = ~Sepal.Width, fill = ~Petal.Length) %>% layer_points()
iris %>% ggvis( x = ~Sepal.Length, y = ~Sepal.Width, size = ~Petal.Length) %>% layer_points()
iris %>% ggvis( x = ~Sepal.Length, y = ~Sepal.Width, shape = ~factor(round(Petal.Length,0)) %>% layer_points()
In the examples shown above, we have used variables to format our plots. If you observe the above code we converted Petal.Length to an integer variable in order to pass it to the shape of the plot.
If we want to make the above mentioned properties a constant input rather than making them dynamic, then we have to use := instead of =, which helps to avoid mistakes as well as expedites the process of creating charts
iris %>% ggvis( x = ~Sepal.Length, y = ~Sepal.Width, fill := “red”) %>% layer_points()
Let’s change the opacity and size of the scatter plots now-
iris %>% ggvis( x = ~Sepal.Length, y = ~Sepal.Width, fill := “red” ,size := 300, opacity := 0.3 ) %>% layer_points()
In above graph we have fixed the size, opacity and color to constant values together.
Till now we have used layer_points() as the data visualization control, which essentially builds a scatter plot. layer_points() provides following options to modify the chart which have been discussed above:
Line Paths & Ribbons using GGVIS
Apart from Scatter plots, GGVIS provides a range of other data visualizations to be created- lines, paths, bars, histograms etc.
In order to build a line chart, layer_lines() function can be used
iris %>% ggvis(x = ~Sepal.Length, y = ~Petal.Length) %>% layer_lines()
Similar to the scatter plots discussed above, layer_lines() also has formatting options like stroke, strokeWidth etc.
If the requirement requires paths to be built, you can use layer_paths() to achieve the same
dataset<-data.frame(v1 = runif(10), v2 = runif(10)) dataset %>% ggvis( x = ~v1, y = ~v2) %>% layer_paths()
Layer_paths() is different from layer_lines() because layer_lines sorts the data on x-axis and then plots the line. While layer_paths() connects the dot in the order provided by user.
You can also create thick ribbons using layers_ribbons() layer.
dataset<-data.frame(v1 = seq(0,10,by=0.1), v2 = seq(0,10,by=0.1)) dataset %>% ggvis( x = ~v1, y = ~v2 + 0.3, y2 = ~v2 – 0.3 ) %>% layer_ribbons()
Histograms and Bar Plots using GGVIS
Using layer_histograms() you can create a histogram on your data. layer_histograms() automatically creates bins by guessing the width
Iris %>% ggvis(~Sepal.Length) %>% layer_histograms()
You can manually provide width of your choice as well as shown below -
iris %>% ggvis(~Sepal.Length) %>% layer_histograms(width = 0.5, center = 0)
Many a times you need to plot a bar plot to observe your data. GGVIS provides you layer_bars() option to build a bar plot
dataset <- data.frame(category = c("a", "b", "c", "d","e","f","g","h","i","j"), values = c(7,2,8,4,3.2,3,1,2,5,7)) dataset %>% ggvis(x = ~category, y = ~values) %>% layer_bars(fill = "blue)
Plotting Trends In Your Data Using Visualisations
When we talk about data science, the most important task is to identify trend in noisy data. GGVIS has layer_smooths()which will help you do that-
iris %>% ggvis( x = ~Sepal.Length, y = ~Petal.Length) %>% layer_smooths()
Let’s see the scatter plot for the same data, which will help you understand the power of layer_smooths()
Can’t we have multiple layers?
GGVIS also empowers you to have more than one layer on the same data. Let’s use the last two charts to plot in a single view.
iris %>% ggvis(x = ~Sepal.Length, y = ~Petal.Length) %>% layer_points()
You can also change the wiggliness of the smoothened line using span parameter while calling the layer.
ggvis(x = ~Sepal.Length, y = ~Petal.Length) %>% layer_points() %>% layer_smooths() %>% layer_smooths(span = 0.3, stroke := "red")
We can also use smoothening fitting using robust linear model-
iris %>% ggvis(x = ~Sepal.Length, y = ~Petal.Length) %>% layer_points() %>% layer_smooths() %>% layer_smooths(span = 0.3, stroke := "red") %>% layer_model_predictions(model = "lm", stroke := "green")
If you observe the above code, there is no formula provided to layer_model_predictions(). When no formula is provided it makes a guess from visualization object . In this case it creates a model with formula y~x. But if we want we can also provide formula to the function.
Following arguments can be passed to layer_model_predictions() –
- se – standerd error band to show; default value is false
Let see what happens if we pass se argument as TRUE
iris %>% ggvis(x = ~Sepal.Length, y = ~Petal.Length) %>% layer_points() %>% layer_model_predictions(model = "lm", stroke := "green", se=T)
GGVIS has smartly plotted a standard error band over the predicted line.
Creating Box Plot Data Visualisations with GGVIS in R
GGVIS also has functions for building boxplots using layer_boxplots() which can also be applied on different groups within your data:
Iris %>% ggvis( ~factor(Species), ~ Sepal.Length) %>% layer_boxplots()
Enrol now for Data Science in R Programming Course
Building Interactive Plots using GGVIS Package in R
GGVIS is a smart data visualisation tool, it provides liberty to make interactive plots which can be used in your Shiny objects. Similar functionality can be observed when executing the code in R Studio.
You can build interactive GGVIS plots without understanding Shiny but it is recommended to understand Shiny to avoid the limited functionality barrier.
In the beginning of the GGVIS tutorial for data visualisation with R, we plotted a simple scatter plot which used opacity and size arguments.
iris %>% ggvis( x = ~Sepal.Length, y = ~Sepal.Width, fill := “red” ,size := 300, opacity := 0.3 ) %>% layer_points()
What if you want to provide freedom or functionality to a user so that he can change the size or opacity dynamically?
Using GGVIS’s input_slider() function it is possible to do this-
iris %>% ggvis( x = ~Sepal.Length, y = ~Sepal.Width, fill := “red” ,size := input_slider(100,500), opacity := 0.3 ) %>% layer_points()
On executing the above code, you will observe a small slider at the bottom of the chart which can be used to change the values that will ultimately pass the value to size argument.
Following two snapshots show two different scatter plot charts with varying size options-
Apart from input_slider() which provides minimum and maximum bounded selection slider there are other options as well.
iris %>% ggvis( x = ~Sepal.Length, y = ~Sepal.Width, fill := input_select(“red”, ”black”, ”blue”, label = “Select Color”) ,size := input_slider(100,500, label = “Select Size”), opacity := 0.3 ) %>% layer_points()
You might have observed two changes from last example.
- We have included a checkbox for color, which gives a dropdown list to user
- We have also included a label into input methods which shows above the input selection object.
Following two plots show case two cases of different size selection.
There is one important parameter map that changes your input value based on the function provided to map.
Input_slider( 100, 500, map = function(x) x*.5)
The above line of code will multiply the selected value by 0.5 and then pass to plot argument. Apart from the discussed interactive controls, there are other options which you can use as per the visualisation requirements for specific data analysis tasks-
- Input_checkbox(): As the name suggests, this control option creates a checkbox which can be used to select values from the given list
- Input_checkboxgroup(): creates a group of checkboxes
- Input_numeric(): a spin box for numeric values
- Input_radiobuttons(): radio buttons to select one of the values
- Input_text(): helps to select arbitrary text input
Limitations of interactive controls using GGVIS Package
GGVIS package was built to provide the power of visualizing data or creating better plots for effective storytelling. As of today the controls are limited to changing the values of visualization parameters e.g., size, span, fill etc. But it doesn’t allow complex interactions like turning on or off certain layers, switching between datasets etc.
GGVIS and Shiny when used together help data professionals make the best use of various data visualisation controls to convey the analysis results. Embed GGVIS plots in Shiny app and you can have full control over the data visualizations you create.
If this tutorial has helped you gain a knack of creating stunning data visualisations in R language then do not forget to hit the social media share buttons to help the data science learning community.
To Do Exercise
Hope this GGVIS tutorial has helped you become a Data Visualisation expert in R. Here is a simple exercise for you- Try creating interactive scatterplots of US demographics using the census dataset through GGVIS.
You can download the census dataset to practice on from the below link - https://www.census.gov/popest/data/datasets.html
Identify any interesting correlations and outliers in the Scatterplot you have plotted using GGVIS.
If you are having a tough time completing the exercise then this is the best time to master your analytic skills in R programming language. Enrol now to get a hands-on experience with GGVIS and various other packages for creating interactive graphics in R language.