PCRE = Perl-Compatible Regular Expression
Basic wildcard matching
. (period) = match any char (except return)
+ = one or more (ba+ matches ba or baaaaaaaa)
* = zero or more (h.*i = hi or hawai’i)
? = zero or one (a.? = am or a or a#)
{3,5} = between 3 and 5 occurrences, inclusive (bi{3,5}t matches "biiit" or "biiiit" or "biiiiit")
\w\s\w\s.\d\w\W matches "C U L8r!"
Greedy vs. minimal matching
- Pattern matching is "greedy," so it will try to match AS MANY CHARACTERS AS POSSIBLE. So matching
H.*o to the line "Hello, how are you doing today?" won't just match "Hello", it will match "Hello, how are you doing to" -- i.e., everything from the first "H" to the last "o". Thus you must BE VERY CAREFUL, particularly when replacing text, as it's easy to erase more text than you intend to.
- matching
M.*i to Mississippi matches entire word
- Add extra "?" to minimize matching:
M.*?i matches just Mi
Mi.+?i matches Missi
- While
Mis? matches Mis, Mis?? matches just Mi (a minimal match of either zero or one "s", so it doesn't match "s" at all.)
- Brackets []: Single character from a set
[bpk]it matches bit or pit or kit 2[a-d] matches 2a or 2b or 2c or 2d [a-fA-F0-9]+ matches a hexadecimal number (i.e., a digit 0-9 or letter A-F) ^ negates a set: [^qz] matches any character EXCEPT q or z [^\d\*M-Z] matches any character EXCEPT a digit, an asterisk, or a letter from M to Z
Basic single-character matching
\d = digit 0-9; \D = any NON-digit char
\s = space, tab, return,etc.; \S = non-space
\w = word char (letter, number, _); \W = non-word char
\t = tab; \n or \r = return; \. matches period char; \+ matches plus sign; backslash in front of + . \ | ( ) [ { ^ $ * ? to match actual char
Matching one of several options
- Use
| (OR sign or pipe) if you want to match one of several possible patterns in one spot
(Jamie|Frank) Ciocco matches either Jamie Ciocco or Frank Ciocco
M[sr]\. (Jamie|Ann) Ciocco matches Mr. Jamie Ciocco or Ms. Ann Ciocco
(F[oi]g|Leaf) matches Fig, Fog, or Leaf
But how do I handle find/replace?
- Problem: have a list of names on separate lines, but all names are last name first (i.e. Ciocco, Jamie)
.*, .* will match Ciocco, Jamie on one line and Clark, George on the next--
- but what do you put in the replace box to change to Jamie Ciocco & George Clark, respectively?
Backreferences: the Missing Link!
- Use parenthesis to "save" pattern-matched text to use in find & replace
- Example:
(.*), (.*) matches Ciocco, Jamie and "saves" Ciocco as \1 and Jamie as \2
- Find/replace
(.*), (.*) with \2 \1 changes Ciocco, Jamie to Jamie Ciocco
- Backreferences count LEFT parens
\1, \2, \3.. from left to right: in ((foot)ball) \2 is foot and \1 is football
BBEdit/TextWrangler Text Factories
- String together multiple Find/replace tasks, along with other commands for modifying & cleaning up a text document
- Save multiple complex find/replace patterns for later use
- Corresponding program feature on Windows?
Multi-file search & replace
- Use this feature to apply find/replace tasks to many documents at once; can auto-save
- Can I completely destroy a whole site’s worth of files with one poorly written pattern?
- Short answer: yes.
- Long answer: please don't do that.
- PCREs are powerful tools, use with caution; test with samples, check results, back up early and often.
|