Home

Project 3

Using Google’s NYC Taxi data in BigQuery, produce the following plots. Match the styling of each plot (axes, legends, titles, etc.). Check out Google’s sample queries, but note that page links to the taxi data split by year rather than the combined data.

Plot 1

Taxi frequency

Plot 2

Outliers were omitted.

Yearly passenger trends by month

Plot 3

Outliers were omitted and the horizontal scale was truncated.

Average distance per trip

Plot 4

Total fares by year

Plots 5 & 6

Show begin/end of each trip in the morning/evening commute hours across days Mon-Fri. Limit the latitude/longitude values to just 3 digits after the decimal point, and compute the number of trips from the same pickup/dropoff lat/lon. Show a circle at the end of the trip (dropoff lat/lon) and sized according to the number of trips with that same pickup/dropoff. Drop records with counts less than 50. Use an alpha=0.05 for the lines and alpha=0.5 for the circles. Limit the plot to the “Midtown Manhattan” map at zoom level 14. See the ggplot cookbook for help plotting on maps.

Morning commutes

Evening commutes

CINF 401 material by Joshua Eckroth is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License. Source code for this website available at GitHub.