CINF 401 - 01, Spring 2017 — Big Data Mining and Analytics

MWF 11-11:50a Eliz Hall 205; pre-reqs: CSCI 221

Work hard in this class and you should be able to:

About me


This course has no required textbook. All required material can be found on this site or the greater web.


Class demonstrations (2)5% each, 10% overall
Assignments (5)5% each, 25% overall
Projects 1-410% each, 40% overall
Project 525%

Class demonstrations

You will be required to demonstrate (for 5 minutes) a data mining/analysis/programming technique in front of the class, on two separate occasions. You are required to submit to me (via email or git) some notes in Markdown format to add to a cookbook on this site. You will not receive credit if your demonstration is the same as a prior demonstration this semester or a prior demonstration documented in the various cookbooks.

The purpose of these demonstrations is to ensure you are engaged with the material but also to show others a wide variety of techniques for handling data. We will use a wide range of tools in this class, and will not have time to learn most of them in depth. Thus, it will be useful for everyone to learn from each other the various tricks each of us discovers as we munge, analyze, and visualize our datasets.

Demonstrations will happen most weeks, involving maybe 2-5 students each week. We’ll establish a schedule early in the semester.

Grading rubric

Assignments and projects are graded according to the following rubric:

Category Points
Completeness 4pts
Clarity of writing & code 1pt

Thus, the maximum score you can receive on an assignment or project is 5/5.


Assignments are individual activities, done outside of class or during work days. You will turn in your materials via git and Bitbucket. See the RStudio workflow and Hadoop workflow notes for details.

All assignments are due by 11:59pm on the stated due date.


A project can involve 1 or 2 people. These projects are more complex than assignments. You will turn in your materials via git and Bitbucket. Only one member of the group needs to submit the code to Bitbucket.

Members of the same group may receive different grades, if I have evidence or a strong belief that not all group members contributed equally.

All projects are due by 11:59pm on the stated due date, except for the final project, which you will present during our “final exam” time (May 6, 5-7pm). There is no final exam on that day, only these presentations.

Late work

Due to the complexity of these assignments and the timing of group work, I will only accept late work up to three days late. Late work is penalized 20% each day it is late. After three days, no credit will be given.


Assignment and project due dates:

Honor code

You are allowed to use a small amount of code from websites (assuming the code is open source). You must indicate where you got the code (put comments in the code). More than 50% of your work or your group’s combined work must be original.

I am strongly in agreement with the Stetson University Honor Code. Any form of cheating is not acceptable, will not be tolerated, and could lead to dismissal from the University.

Academic success center

If a student anticipates barriers related to the format or requirements of a course, she or he should meet with the course instructor to discuss ways to ensure full participation. If disability-related accommodations are necessary, please register with the Academic Success Center (822-7127; www.stetson.edu/asc) and notify the course instructor of your eligibility for reasonable accommodations. The student, course instructor, and the Academic Success Center will plan how best to coordinate accommodations. The Academic Success Center is located at 209 E Bert Fish Drive, and can be contacted using the email address asc@stetson.edu.

CINF 401 material by Joshua Eckroth is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License. Source code for this website available at GitHub.