Regular Expressions
In his spare time, Seth uses regular expressions to solve crossword puzzles and perform hostage rescue.
Regular Expressions are used for pattern recognition in text
Usage
applications:
textmate, notepad++, squirrelmail, vi, emacs
programming languages:
PHP, JS, Perl, python
command line tools:
grep, egrep, sed, awk, bash
there are different versions of regular expressions in each of these languages
toolkits and apis
beautiful soup (for scraping stuff out of web pages) – useful for getting temperature or data
+ AND MORE!
coming soon to a computer near you
Ways of using regular expressions
- MATCH
- SEARCH
- REPLACE/SUBSTITUTE
.
means anything, anything at all
cat, mat, bat, hat
all 3 letter words that end with "at"
.at
will match non-words too, like 4at.
?
means optional
colou?r
will match both color and colour but not colouur.
.?.at would match
chat, spat, that
+
1 one or more times
*
any number of times, including 0
a*
would match one or more s
a.*n
would match "aspiration" but also "an"
()
parenthesis groups operations together like in arithmatic
(1+3) x 4 = 16
[]
any single character
- [abc]
- [0123456789]
- [0-9] matches any number at all, not just numerals 0 through 9
- [a-z] matches a THROUGH z
- [aeiou0-9] matches any vowel or number
- [A-Za-z] matches any character, upper or lower case
[Aa]asian
will match any capitalization of the word asian.
\
escape character cancels the special meaning, so we can find a period, for example
\.
matches .
\\
matches \
sometimes these are called back references
^
negation. anything except what follows:
[^aeiou]
matches any character besides a, e, i, o, u
{}
a certain number of times
to find 4 vowels in a row:
[aeiou][aeiou][aeiou][aeiou]
can more succinctly be expressed as
[aeiou]{4}
,
add a comma (,) to create a range
[^aeiou]{4,6}
matches all words that contain between 4 and 6 consonants in a row
|
the pipe (|) character is the "logical operator" for the concept of
|
aka OR
can also be understood as an "OR gate"
it's also great for combining expressions!
More Examples/Use cases
Let's identity misspellings of the word banana, i.e. bananna, bananana
b(an)*a
or, we can also use
ban(an)+a
[0-9]{4}
matches a 4 digit number
to match a credit card number:
([0-9]{4}){4}
matches 4 groups of 4 numbers but, what if we want spaces?
([0-9]{4}[-]?)[0-9]{3}[0-9]{4}
though this allows us to mix both spaces and hyphens.