Home

Cookbook: Unix tools

A virtual machine with all these tools

Contributed by Antonio Ruiz. Download this VirtualBox image to have access to a wide range of command line tools for data science.

Regular expressions

Regular expressions are not only found in Unix, but a lot of Unix tools use them. For example, grep.

Menagerie of little tools

cat, head, tail

Concatenate (combine) two or more files:

cat a.txt b.txt

You probably want to save the output in another file:

cat a.txt b.txt > ab.txt

View just the top 10 lines of a file (first command), or top 5 lines (second command):

head a.txt
head -n 5 a.txt

View just the last 10 lines of a file, or variations:

tail a.txt         # last 10 lines
tail -n 5 a.txt    # last 5 lines
tail -n +5 a.txt   # first 5 lines

tail is also useful to watch a file as it changes, as you would a log file:

tail -f a.txt

grep

Bash

Loops

for VAR in SOME STUFF SEPARATED BY SPACES OR NEWLINES OR TABS; do \
  EXECUTE SOMETHING WITH $VAR; \
  PERHAPS SOMETHING ELSE; \
done

E.g.,

for x in my list of stuff separated by spaces; do echo $x; done

The backtick \` is often useful to execute a command and use its (space- or newline-separated output) to feed into the for loop:

for x in `ls foo*`; do mkdir dir_$x; mv $x dir_$x; done

csvkit

Convert Excel to CSV:

in2csv foo.xlsx > foo.csv

Display column names:

csvcut -n foo.csv

wget

wget is for downloading web pages or FTP directories.

wget http://example.com  # download a web page

Use --no-check-certificate if wget complains about a certificate (SSL):

wget --no-check-certificate http://example.com

cron

The system cron tool (which is always running) allows you to specify commands to run at certain times in the day/week/month. Contributed by Nathan Hilliard.

Start by running crontab -e, then specify the time frame and command to run, one per line. See this quick reference for details about that file.

CINF 401 material by Joshua Eckroth is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License. Source code for this website available at GitHub.