Note: See the corresponding cookbook about ggplot. This page has lecture notes.
The “gg” of ggplot stands for “grammar of graphics.” The ggplot library provides a set of functions that may be added together to produce a plot. This plot may be shown on the screen or saved to a PDF or PNG file.
Every plot starts with the
ggplot() function. Then, you “add” graphics to it (using
+), such as
geom_point(). The examples below and in the cookbook illustrate this point.
As part of this “grammar of graphics” concept, visual features of each subplot are defined by its “aesthetics”, specified with
aes(). The aesthetics include the values along the x-axis, y-axis, line colors, line types, fill colors, point shapes and sizes, etc. See the examples below and in the cookbook.
Basic pattern of use
Typically, I use ggplot like this:
If I want to be quick, and not save the result, I can just run
ggplot() plus whatever
geom_* I want and it will show on the screen:
Important: The values for any single aesthetic (x values, y values, fill color, line color, etc.) must come from a single column in the data frame.
If your values come from different columns (or the values are column names), you’ll need to melt (and possibly cast) the data frame first.
Sometimes you want to set a color, shape, or facets based on some column. If that column is not already a “factor”, you may need to cast it as one first, using
factor(col) for the column name instead of just
Consider this data frame:
Suppose I want this graph:
That’s not (easily) possible with ggplot because the x-axis values may only come from a single column. Thus, we need to melt the data frame first:
Now, let’s try to create this plot:
It turns out our
dmelt$variable column is a factor, not numeric. So our x-axis is not acting as a numeric axis, so we can’t draw a line across it.
We first verify this is the problem:
Indeed, it is a factor. Let’s convert it to a numeric. We need to use
as.numeric(as.character(...)) because we want to convert the character version of each value into a number. If we just convert each value into a number, it will use the factor index positions (1 through n).
Check our work:
Now this command gives us our plot: