Regular Expressions

From DevSummit
Revision as of 17:57, 5 May 2015 by Vivian (talk | contribs) (1 revision imported)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

In his spare time, Seth uses regular expressions to solve crossword puzzles and perform hostage rescue.

Regular Expressions are used for pattern recognition in text

Usage

applications:

textmate, notepad++, squirrelmail, vi, emacs

programming languages:

PHP, JS, Perl, python

command line tools:

grep, egrep, sed, awk, bash

there are different versions of regular expressions in each of these languages

toolkits and apis

beautiful soup (for scraping stuff out of web pages) – useful for getting temperature or data


+ AND MORE!

coming soon to a computer near you

Ways of using regular expressions

  • MATCH
  • SEARCH
  • REPLACE/SUBSTITUTE

.

means anything, anything at all

   cat, mat, bat, hat

all 3 letter words that end with "at"

    .at

will match non-words too, like 4at.

?

means optional

   colou?r

will match both color and colour but not colouur.

.?.at would match

   chat, spat, that

+

1 one or more times

*

any number of times, including 0

   a*

would match one or more s

   a.*n

would match "aspiration" but also "an"

()

parenthesis groups operations together like in arithmatic

   (1+3) x 4 = 16

[]

any single character

  • [abc]
  • [0123456789]
  • [0-9] matches any number at all, not just numerals 0 through 9
  • [a-z] matches a THROUGH z
  • [aeiou0-9] matches any vowel or number
  • [A-Za-z] matches any character, upper or lower case
   [Aa]asian

will match any capitalization of the word asian.

\

escape character cancels the special meaning, so we can find a period, for example

   \.

matches .

   \\

matches \

sometimes these are called back references

^

negation. anything except what follows:

   [^aeiou]

matches any character besides a, e, i, o, u

{}

a certain number of times

to find 4 vowels in a row:

   [aeiou][aeiou][aeiou][aeiou]

can more succinctly be expressed as

   [aeiou]{4}

,

add a comma (,) to create a range

   [^aeiou]{4,6}

matches all words that contain between 4 and 6 consonants in a row

|

the pipe (|) character is the "logical operator" for the concept of

   |

aka OR

can also be understood as an "OR gate"


it's also great for combining expressions!

More Examples/Use cases

Let's identity misspellings of the word banana, i.e. bananna, bananana

   b(an)*a

or, we can also use

   ban(an)+a
   [0-9]{4}

matches a 4 digit number

to match a credit card number:

   ([0-9]{4}){4} 

matches 4 groups of 4 numbers but, what if we want spaces?

   ([0-9]{4}[-]?)[0-9]{3}[0-9]{4}

though this allows us to mix both spaces and hyphens.

Learning tools

http://rubular.com