# Cookbook: ggplot

Note: See the corresponding lecture notes about ggplot. This page has cookbook recipes.

## Cheatsheet

Ou Zheng found this amazing ggplot cheatsheet produced by RStudio.

## Common plots

### Points

Given this dataset,

Described as follows (from ?faithful),

Waiting time between eruptions and the duration of the eruption for the Old Faithful geyser in Yellowstone National Park, Wyoming, USA. eruptions is eruption time in minutes, waiting is waiting time to next eruption

Produce,

Technique:

### Jittered points

Sometimes, points lie on top of each other or so close they’re hard to see. Sometimes (though rarely), it’s a good idea to “jitter” them so they are more separate.

For example, here is a plot of the miles-per-gallon data of engine size (displacement in liters) vs. highway mpg:

This code made that plot:

You can’t tell, but there are actually lots of points on top of each other. Adding a jitter shows that:

This code made that plot:

You can use width and height arguments in position_jitter() to indicate how much jitter is allowed horizontally and vertically. Sometimes you want to disable vertical jittering, and only have horizontal jittering. Or vice versa.

### Lines

Given this dataset,

Produce,

Technique:

### Histograms

Given this dataset,

Produce,

Technique:

### Density

Given this dataset,

Produce,

Technique:

• facet_grid(. ~ spray, labeller=label_both) creates the horizontal faceting on the spray column values. There is no vertical faceting because . was used in the vert ~ horz formula. (See faceting notes below.) The facet labels have both the variable name and value because I set the labeller to label_both.

### Box-and-whisker plots

Box-and-whisker plots show:

• the “min” and “max” (the whiskers), which are not always the true min/max (see below)
• the lower 25% quartile (bottom of box)
• the upper 75% quartile (top of box)
• the median (middle line in box)

A quartile is a quarter of the data, after sorting. So the 25% quartile is the number at the 25% position of the sorted data.

Sometimes, extra dots are shown beyond the whiskers to indicate values that fall outside the range median - 1.5*IQR or median + 1.5*IQR where IQR is the interquartile range, or the value calculated by subtracting the 25% quartile value from the 75% quartile value (75% val - 25% val).

They are hard to interpret for many people, so I discourage their use.

Given the dataset,

Produce,

Technique:

### Bar charts

Given this dataset,

Produce,

Technique:

• stat="identity" means use the values from the data (the expense values) for the bar heights

• position="dodge" means to place the bars next to each other instead of on top of each other

• labs(fill="Expense Type") means to rename the legend label for the fill colors

### Text

Given this iris (flower) data,

Produce,

Technique:

• geom_text() needs an x, y, and label
• guides(color=FALSE) means don’t produce a legend for the color aesthetic

### Summary plots

Given the movies dataset,

Produce,

Technique:

Note, you can achieve the same with aggregate and plot a normal geom_line:

## Axes

See the Cookbook for R.

## Legends

See the Cookbook for R.

## Facets

Facets are horizontal/vertical grids of subplots. You can create facets based on one or two columns. Each subplot shows only those data that have the facet cell’s particular value in the faceted columns.

Facets are created with the following code (to be added to a plot with +):

vert and horiz should be column names. You can use a . in either vert or horiz place to indicate no vertical or no horizontal faceting.

Many examples may be found in the Cookbook for R.

### Facet labels

By default, the facet labels only show the variables’ values. If you want to show the variable name as well, use the label_both labeller:

## 3D Scatterplots

While not actually ggplot, there is a library for 3D scatterplots. Read its PDF documentation. This library was demonstrated by Christian Micklisch.

Interactive 3D scatter plots can be done as follows. Contributed by Marisa Gomez.

## Map plots

Using the ggmap library, you can plot on maps! For example, we can plot quake data on a map of Fiji. The built-in dataset quakes contains lat/long coordinates and quake magnitude:

Using ggmap, we can download a map of Fiji,

And then make a plot with the points on top of the map. We’ll make the points partially transparent, and their size relative to the quake magnitude:

Or Houston crime data:

A 2d density plot can show you which areas have the most crime.

Use the googleVis package to get interactive visualizations. Contributed by Marisa Gomez.

See this article from the R Journal for details. Marisa’s example went as follows:

## Animated plots

Contributed by Katie Porterfield.

Use the caTools library to save plots as animated GIFs. Use the animate library to create the animated plots. This blog entry has some good examples.

## Multidimensional scaling

Plot relative distances in 2D between a bunch of high-dimensional points.

First, compute the all-pairs distances:

Then compute x,y values for each row, keeping the relative distances:

Finally, plot it:

Here is an example on the iris dataset, which has 4-dimensional data:

Here is a better demo. Consider the distances between several US cities.

Read this into R:

You get this graph (after flipping the x-axis). Notice that the city’s relative locations are correct, since multidimensional scaling tries to respect these distances while arranging the points.

## Interactive graphs

Contributed by Malak Patel.

library(ggplot2)
library(ggiraph)

#hover effect
###Part 1
g <- ggplot(mpg, aes( x = displ, y = cty, color = hwy) ) + theme_minimal()
my_gg <- g + geom_point_interactive(aes(tooltip = model), size = 2)

###Part 2
ggiraph(code = print(my_gg), width = .7)

#hover red effect
my_gg <- g + geom_point_interactive(aes(tooltip = model, data_id = model), size = 2)
ggiraph(code = print(my_gg), width = .7, hover_css ="cursor:pointer;fill:red;stroke:red;")

#clickable graph
###Part 1
crimes <- data.frame(state = tolower(rownames(USArrests)), USArrests)

###Part 2
crimes$onclick <- sprintf("window.open(\"%s%s\")", "http://en.wikipedia.org/wiki/", as.character(crimes$state) )

gg_crime <- ggplot(crimes, aes(x = Murder, y = Assault, color = UrbanPop )) +
geom_point_interactive(aes( data_id = state, tooltip = state, onclick = onclick ), size = 3 ) +
scale_colour_gradient(low = "#999999", high = "#FF3333") +
theme_minimal()

###Part 3
ggiraph(code = print(gg_crime),
hover_css = "fill-opacity:.3;cursor:pointer;")

#Zoom effect
ggiraph(code = print(gg_crime + theme_linedraw()), zoom_max = 5)

#https://rstudio.github.io/dygraphs/index.html
#Another visual
library(dygraphs)
lungDeaths <- cbind(mdeaths, fdeaths)
dygraph(lungDeaths)

#Even more detail
dygraph(lungDeaths) %>% dyRangeSelector()

#3d graph
library(plotly)
plot_ly(z = ~volcano, type = "surface")


CINF 401 material by Joshua Eckroth is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License. Source code for this website available at GitHub.