__Data Visualization in R__

__Data Visualization in R__

**Ggplot**

Ggplot is a plotting system for Python based on R’s ggplot2 and the ** Grammer of Graphics**. It is built for making profressional looking, plots quickly with minimal code. It takes care of many of the complicated details that make plotting difficult (like drawing legends) as well as providing a powerful model of graphics that makes it easy to produce complex multi-layered graphics.

__How to install ggplot2 package__:

__How to install ggplot2 package__

Ggplot2 can be easily installed by typing:

install.packages("ggplot2")

Make sure that you are using the latest version of R to get the most recent version of ggplot2.

__Application of ggplot2__:

__Application of ggplot2__

The grammar implemented in *ggplot2* provides an infrastructure for composing a graphic from multiple elements. The main applications of ggplot2 are:

- Aesthetics ,which refer to visual attributes that affect how data are displayed in a graphic, e.g., color, point size, or line type.
- Geometric objects for visual representation of observations such as points, lines, polygons, box plots, error bars, etc.
- Faceting which applies the same type of graph to each defined subset of the data, usually indicated by the unique values of a categorical variable or factor.
- Annotation, which allows you to add text and/or external graphics to a ggplot.
- Positional adjustments, to reduce overplotting of points.

__Examples of qplot__:

__Examples of qplot__

The **qplot()** function can be used to create the most common graph types. It can create a very wide range of useful plots.

The format is :

qplot(x,y, data=, color=, shape=, size=, alpha=, geom=, method=, formula=, facets=, xlim=, ylim= xlab=, ylab=, main=, sub=)

Some of the examples are:

- qplot examples

library(qplot)

- create factors with value labels

```
mtcars$gear <- factor(mtcars$gear,levels=c(3,4,5),
labels=c("3gears","4gears","5gears"))
mtcars$am <- factor(mtcars$am,levels=c(0,1),
labels=c("Automatic","Manual"))
mtcars$cyl <- factor(mtcars$cyl,levels=c(4,6,8),
labels=c("4cyl","6cyl","8cyl"))
```

- Kernel density plots for mpg

Grouped by number of gears (indicated by color)

```
qplot(mpg, data=mtcars, geom="density", fill=gear, alpha=I(.5),
main="Distribution of Gas Milage", xlab="Miles Per Gallon",
ylab="Density")
```

- Scatterplot of mpg vs. hp for each combination of gears and cylinders

In each facet, transmitting type is represented by shape and color

```
qplot(hp, mpg, data=mtcars, shape=am, color=am,
facets=gear~cyl, size=I(3),
xlab="Horsepower", ylab="Miles per Gallon")
```

- Separate regressions of mpg on weight for each number of cylinders

```
qplot(wt, mpg, data=mtcars, geom=c("point", "smooth"),
method="lm", formula=y~x, color=cyl,
main="Regression of MPG on Weight",
xlab="Weight", ylab="Miles per Gallon")
```

- Boxplots of mpg by number of gears

Observations (points) are over layered and jittered

```
qplot(gear, mpg, data=mtcars, geom=c("boxplot", "jitter"),
fill=gear, main="Mileage by Gear Number",
xlab="", ylab="Miles per Gallon")
```

**Adding Aesthetics (Shape, Color and Size) and Faceting in the qplot function**:

** aes** creates a list of unevaluated expressions. This function also performs partial name matching, converts color to color, and old style R names to ggplot names (eg. pch to shape, cex to size). The first difference when using qplot instead of plot comes when you want to assign different colors or size or shape to the points on your plot. Plot converts the categorical variable in your data into something which plot knows how to use. Qplot does this automatically and provides a legend that maps the displayed attributes.

**Color:**

- Bar chart example

c <- ggplot(mtcars, aes(factor(cyl)))

- Default plotting

c + geom_bar()

- To change the interior coloring use fill aesthetic

c + geom_bar(fill = "red")

- Compare with the color aesthetic which changes just the bar outline

c + geom_bar(colour = "red")

- Combining both, you can see the changes more clearly

c + geom_bar(fill = "white", colour = "red")

**Size:**

Size should be specified with a numerical value (in millimetres), or from a variable source.

```
p <- ggplot(mtcars, aes(wt, mpg))
p + geom_point(size = 4)
p + geom_point(aes(size = qsec))
p + geom_point(size = 2.5) + geom_hline(yintercept = 25, size = 3.5)
```

**Shape:**

Example of shape in the data visualizations.

Shape takes four types of values: an integer in [0, 25], a single character-- which uses that character as the plotting symbol, to draw the smallest rectangle that is visible (i.e., about one pixel), an NA to draw nothing.

```
p + geom_point()
p + geom_point(shape = 5)
p + geom_point(shape = "k", size = 3)
p + geom_point(shape = ".")
p + geom_point(shape = NA)
```

- Shape can also be mapped from a variable

p + geom_point(aes(shape = factor(cyl)))

**Faceting:**

In some circumstances we want to plot relationships between set variables in multiple subsets of the data with the results appearing as panels in a larger figure. This is a known as a **facet plot**. This is a very useful feature of** ggplot2**. The faceting is defined by a categorical variable or variables.

**Facet Grid:**

The data can be split up by one or two variables that vary on the horizontal and/or vertical direction.

Usage:

```
facet_grid(facets, margins = FALSE, scales = "fixed", space = "fixed", shrink = TRUE,
labeller = "label_value", as.table = TRUE, drop = TRUE)
```

For Example:

p <- ggplot(mtcars, aes(mpg, wt)) + geom_point() # With one variable p + facet_grid(. ~ cyl)

**Face_Wrap:**

Here, a single categorical variable defines subsets of the data. The panels are calculated in a 1 dimensional ribbon that can be wrapped to multiple rows.

Usage:

facet_wrap(facets, nrow = NULL, ncol = NULL, scales = "fixed", shrink = TRUE, as.table = TRUE, drop = TRUE)

For example:

```
d <- ggplot(diamonds, aes(carat, price, fill = ..density..)) +
xlim(0, 2) + stat_binhex(na.rm = TRUE) + theme(aspect.ratio = 1)
d + facet_wrap(~ color)
```

**Geom**:

Geometric objects (geoms) are the visual representations of (subsets of) observations. We have so many geoms which are used for visual representations. Some of them are geom_point, geom_jitter, geom_text, geom_segment etc. We will explain you how to use geom by taking a example of geom_point.

Geom_point:

It is a geom which draws a point defined by an x and y coordinates

This example shows a scatterplot. It represents a rather common configuration with use of some extra aesthetic parameters, such as size, shape, and color. The plot uses two aesthetic properties to represent the same aspect of the data. The `gender`

column is mapped into a shape *and* into a color. The plot maps the continuous `speed`

column onto the aesthetic `size`

property. To ensure that even observations with a "low" speed are still mapped to rather large points, the plot uses `scale_size_continuous`

to define the range of point sizes to use.

**Lattice:**

The lattice add-on package is an implementation of Trellis graphics for R. It is a powerful and elegant high-level data visualization system with an emphasis on multivariate data. It is designed to meet most typical graphics needs with minimal tuning, but can also be easily extended to handle most nonstandard requirements.

**How to install Lattice package:**

The lattice package is installed along with R. It can be installed by typing

> library(package = "lattice")

The most recent version of Lattice is available from ** CRAN. ** The latest development snapshot is available from

**.**

__R- forge__**Application of Lattice package:**

The following is the list of high level functions in the lattice package which are used in data visualization:

**Univariate:**

- barchart: Bar plots.
- bwplot: Box-and-whisker plots.
- densityplot: Kernel density estimates.
- dotplot: Cleveland dot plots.
- histogram: Histograms.
- qqmath: Theretical quantile plots.
- stripplot: One-dimensional scatterplots
**Bivariate:**- qq: Quantile plots for comparing two distributions.
- xyplot: Scatterplots and time-series plots (and potentially a lot more)

**Trivariate:**

- levelplot: Level plots (similar to image plots).
- contourplot: Contour plots.
- cloud: Three-dimensional scatter plots.
- wireframe: Three-dimensional surface plots (similar to persp plots).

**Hypervariate:**

- splom: Scatterplot matrices.
- parallel: Parallel coordinate plots

**Miscellaneous:**

- rfs: Residual and fitted value plots (also see oneway).
- tmd: Tukey Mean-Difference plots.

Lattice also provides a collection of convenience functions that correspond to the primitives lines, points, etc. These are implemented using Grid graphics.These functions have names like** llines** or **panel.lines** and are often useful when writing nontrivial panel functions.

**Examples of Lattice package**:

Here are some examples of Lattice package which uses car data like mileage, number of cylinders, gears etc from the__ mtrcars__ data frame.

- Lattice Examples

library(lattice) attach(mtcars)

- Create factors with value labels

gear.f<-factor(gear,levels=c(3,4,5), labels=c("3gears","4gears","5gears")) cyl.f <-factor(cyl,levels=c(4,6,8), labels=c("4cyl","6cyl","8cyl"))

- Kernel density plot

densityplot(~mpg, main="Density Plot", xlab="Miles per Gallon")

- Kernel density plots by factor level

densityplot(~mpg|cyl.f, main="Density Plot by Number of Cylinders", xlab="Miles per Gallon")

- Kernel density plots by factor level (alternate layout)

densityplot(~mpg|cyl.f, main="Density Plot by Numer of Cylinders", xlab="Miles per Gallon", layout=c(1,3))

- Boxplots for each combination of two factors

bwplot(cyl.f~mpg|gear.f, ylab="Cylinders", xlab="Miles per Gallon", main="Mileage by Cylinders and Gears", layout=(c(1,3))

- Scatterplots for each combination of two factors

xyplot(mpg~wt|cyl.f*gear.f, main="Scatterplots by Cylinders and Gears", ylab="Miles per Gallon", xlab="Car Weight")

- 3-Dscatterplot by factor level

cloud(mpg~wt*qsec|cyl.f, main="3D Scatterplot by Cylinders")

- Dotplot for each combination of two factors

dotplot(cyl.f~mpg|gear.f, main="Dotplot Plot by Number of Gears and Cylinders", xlab="Miles Per Gallon")

- Scatterplot matrix

splom(mtcars[c(1,3,4,5,6)], main="MTCARS Data")

**Learn Data Science by working on interesting Data Science Projects for just $9**

**Ggvis**:

Ggvis is data visualization for R which enables us to describe data graphics with a syntax similar to ** ggplot2**. It enables to view and interact with the graphics on our local computer.

**How to install ggvis**:

Ggvis can be directly installed from ** GitHub**. Make sure you have the latest version of devtools (at least 1.4) and run the following

devtools::install_github(c("hadley/testthat", "rstudio/shiny", "rstudio/ggvis"))

**Application of Ggvis**:

Some interactive plots can be generated using package ggvis

- It is still being developed.
- It can be used in
application.__Shiny__ - It is similar to ggplot2 but is designed for dynamic web graphics.
- It uses chain operations %>% for multiple layers.
- Ggvis can do limited things, but what it does, it requires less effort.

**Examples of Ggvis:**

```
mtcars %>% ggvis(x= ~wt) %>%
layer_densities (
stroke := input_radiobuttons(c("Purple","Orange","steelblue"), label="Line color"),
fill := input_select(c("Purple","Orange","steelblue"), label="Fill color")
)
library(ggvis)
mtcars %>% ggvis(x = ~wt) %>%
layer_densities (
adjust = input_slider(.1, 2, value = 1, step = .1, label = "Bandwidth adjustment"),
kernel = input_select (
c("Gaussian" = "gaussian",
"Epanechnikov" = "epanechnikov",
"Rectangular" = "rectangular",
"Triangular" = "triangular",
"Biweight" = "biweight",
"Cosine" = "cosine",
"Optcosine" = "optcosine"),
label = "Kernel")
)
```