Regular Expressions

From DevSummit
Jump to: navigation, search

In his spare time, Seth uses regular expressions to solve crossword puzzles and perform hostage rescue.

Regular Expressions are used for pattern recognition in text

Usage

applications:

textmate, notepad++, squirrelmail, vi, emacs

programming languages:

PHP, JS, Perl, python

command line tools:

grep, egrep, sed, awk, bash

there are different versions of regular expressions in each of these languages

toolkits and apis

beautiful soup (for scraping stuff out of web pages) – useful for getting temperature or data


+ AND MORE!

coming soon to a computer near you

Ways of using regular expressions

  • MATCH
  • SEARCH
  • REPLACE/SUBSTITUTE

.

means anything, anything at all

   cat, mat, bat, hat

all 3 letter words that end with "at"

    .at

will match non-words too, like 4at.

?

means optional

   colou?r

will match both color and colour but not colouur.

.?.at would match

   chat, spat, that

+

1 one or more times

*

any number of times, including 0

   a*

would match one or more s

   a.*n

would match "aspiration" but also "an"

()

parenthesis groups operations together like in arithmatic

   (1+3) x 4 = 16

[]

any single character

  • [abc]
  • [0123456789]
  • [0-9] matches any number at all, not just numerals 0 through 9
  • [a-z] matches a THROUGH z
  • [aeiou0-9] matches any vowel or number
  • [A-Za-z] matches any character, upper or lower case
   [Aa]asian

will match any capitalization of the word asian.

\

escape character cancels the special meaning, so we can find a period, for example

   \.

matches .

   \\

matches \

sometimes these are called back references

^

negation. anything except what follows:

   [^aeiou]

matches any character besides a, e, i, o, u

{}

a certain number of times

to find 4 vowels in a row:

   [aeiou][aeiou][aeiou][aeiou]

can more succinctly be expressed as

   [aeiou]{4}

,

add a comma (,) to create a range

   [^aeiou]{4,6}

matches all words that contain between 4 and 6 consonants in a row

|

the pipe (|) character is the "logical operator" for the concept of

   |

aka OR

can also be understood as an "OR gate"


it's also great for combining expressions!

More Examples/Use cases

Let's identity misspellings of the word banana, i.e. bananna, bananana

   b(an)*a

or, we can also use

   ban(an)+a
   [0-9]{4}

matches a 4 digit number

to match a credit card number:

   ([0-9]{4}){4} 

matches 4 groups of 4 numbers but, what if we want spaces?

   ([0-9]{4}[-]?)[0-9]{3}[0-9]{4}

though this allows us to mix both spaces and hyphens.

Learning tools

http://rubular.com