# Demo: MapReduce

## Word counts

Run on the delenn virtual cluster with the following command:

yarn jar wc.jar /datasets/westburylab-usenet/WestburyLab.NonRedundant.UsenetCorpus.txt \
/users/jeckroth/wordcount/output-westburylab-usenet-1


## Baseball Friends

Consider this input file:

Aaden, Red Sox, Alannah, Alayna, Alex, Alondra, Amelia, Amir, Anika, ...
Aaliyah, Cardinals, Adley, Aliyah, Amirah, Ana, Anya, ...
...


Each row has a person’s name (e.g., Aaden), then their favorite baseball team (only Red Sox or Cardinals), then that person’s list of friends. Their friends may or may not like the same team; we would have to examine the row for each person to determine which team they like. Friendship is symmetric: if X is a friend of Y, then Y will also be a friend of X.

Given that input, produce an output file like:

Aaden,Red Sox,47,18
Aaliyah,Cardinals,30,43
Aarav,Cardinals,27,55
Aaron,Red Sox,55,24
Abbie,Cardinals,32,38
Abbigail,Red Sox,48,27
Abby,Red Sox,48,27
Abdiel,Cardinals,33,47
Abdullah,Red Sox,52,20
Abel,Red Sox,51,23
Abigail,Cardinals,33,49
Abraham,Red Sox,48,22
Abram,Red Sox,54,33
Abrielle,Red Sox,72,30
...


Each row in the output lists a person’s name and a team, then the count of that person’s friends who like the Red Sox (first number) and Cardinals (second number). Since Aaden likes the Red Sox, he is included in the 48 count, as well as 47 of his friends who also likes the Red Sox. In the second row, Aaden has 18 friends who like the Cardinals (this time, not including himself, since he likes the Red Sox and not the Cardinals).

### Strategy

The map stage will receive one line of the input file. Using that line, we need to report the person’s team preference as well as the person’s friends. We don’t know the friends’ team preferences yet (not by looking at a single line), so we’ll have to be clever. The mapper will output “key = person name”, value = “friend with other person, who likes specific team”. So, for the input line:

Aaden, Red Sox, Alannah, Alayna, Alex, Alondra, Amelia, Amir, Anika, ...


…the mapper will output the following for that single line of input:

key = Aaden, value = Aaden, Red Sox   (account for the person's own team preference)
key = Alannah, value = Aaden, Red Sox (account for Alannah being Aaden's friend and Aaden liking the Red Sox)
key = Alayna, value = Aaden, Red Sox  (etc. same for rest)
...


Thus, the reducer gets a key/value pair like:

key = Aaden, value = [Aaden, Red Sox; Aaliyah, Cardinals; ...]


…and simply counts how many friends of Aaden (including Aaden himself) like each team.

### Implementation

You’ll need the helper classes CSVOutputFormat.java and TextArrayWritable.java. The file below is BaseballFriends.java. The input file is baseball_friends.csv.

## Reverse index

gutenberg-small.txt