Write all of your code in an RMarkdown file. Show every step of your process.
Grab the first table of Foreign Terrorist Organizations (the one called “Designated Foreign Terrorist Organizations”) entirely using R (do not copy-paste the content or download ahead of time). Consider using the
readHTMLTable function in the
XML library in R. Note that
readHTMLTable cannot open
https: connections, so you’ll need to do some googling to figure this out.
Produce a data frame sorted by organization name, i.e., the following:
Produce another data frame that shows the number of terrorist organizations that began in each year. Use the same column names shown below, and ensure it is ordered by year.
Note: For full credit, you must not manually modify any of the data. Use only R functions/features to manipulate the data. You should never type “2007”, or “al-Nusrah Front”, for example. You are allowed to rewrite the column names using
colnames(). And as stated above, for full credit you must download the HTML page from R itself, and not save any intermediate CSV files.
- Final release data
- US Institutions (should be about 7500 of them)
- Variables: Graduation rate with Bachelor degree within 4 years, total (all students), all years
You’ll get a CSV file with several columns (institution fields, plus a column for each year). Cast/melt/merge/munge until you have a data frame that looks like this, with the exact column names shown (the rows may appear in a different order, that’s no problem):
summary on the data frame will summarize each column (mean, max, min, etc.). Do this so we can be sure you’ve combined all the data (see how all the years are represented):
Now, make a data frame with the average 4-year graduation rate, across all years, for each school:
Next show Stetson’s individual year rates:
Finally, show Steston’s average rate. It should be 54.27 :(
Formulate an interesting question. State this question and why it’s interesting (2-3 sentences altogether). (Note: each person should have a different question and probably use different data.)
Find your own two data sets, from different sources, that may help you answer this question, and merge them with
merge. Then create a summary data frame (using
aggregate). This summary should lead to an answer to your question.
Convince me you have answered your question. If you cannot answer your question, formulate a new question and possibly find new data. Try to determine if you’ll be able to answer it before you dive too deeply in useless data munging.