Cookbook: Regular expressions

External resources

  • RegViz – visualize your regular expressions to see which text matches and does not match. Also shows grouping () results in vars like $1, $2, etc.

  • Java Regex tester – also see the cookbook link at the top

  • Great Regex cheatsheet

Regex in Java

import java.util.regex.Matcher;
import java.util.regex.Pattern;

class Foo {

    // this pattern will match name field in text like:
    // blah blah blah Name="Jane Doe" blah blah blah
    private Pattern namePattern = Pattern.compile(".*Name=\"(.*?)\".*");

    void myfunc() {
        Matcher m = namePattern.matcher(str);
        if(m.matches()) {
            String name = m.group(1);  // get submatch from parentheses
            // ...

Find matching portions of lines

Using Perl, print only those portions of lines that match a certain regex, e.g., find links that start with /site/... and end with .xls:


# script name: get-links.pl

while(<>) {                                 # for each line of input
    $line = $_;                             # use a better variable name
    while($line =~ m!(/sites/.*?\.xls)!g) { # for each URL pattern on this line (g means 'all')
        print $1."\n";                      # print the part that matched in the regex parens

Run it like this:

perl get-links.pl < my-file.txt

Here is the same thing using just grep (no Perl):

grep -oP '/sites/.*?\.xls' my-file.txt

The grep -o option means only print the part that matched the regex, not the whole line that matched (which is the default), and -P means use Perl-style regular expressions, which is needed for the .*? non-greedy match-anything syntax.

CINF 401 material by Joshua Eckroth is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License. Source code for this website available at GitHub.