Go to Recordable Macro section Go to PCRE Search tutorial Go to JavaScript section Go to Perl section Go to C section (get it?)
PCRE = Perl-Compatible Regular Expression
Basic wildcard matching
  • . (period) = match any char (except return)
  • + = one or more (ba+ matches ba or baaaaaaaa)
  • * = zero or more (h.*i = hi or hawai’i)
  • ? = zero or one (a.? = am or a or a#)
  • {3,5} = between 3 and 5 occurrences, inclusive (bi{3,5}t matches "biiit" or "biiiit" or "biiiiit")
  • \w\s\w\s.\d\w\W matches "C U L8r!"
Greedy vs. minimal matching
  • Pattern matching is "greedy," so it will try to match AS MANY CHARACTERS AS POSSIBLE. So matching H.*o to the line "Hello, how are you doing today?" won't just match "Hello", it will match "Hello, how are you doing to" -- i.e., everything from the first "H" to the last "o". Thus you must BE VERY CAREFUL, particularly when replacing text, as it's easy to erase more text than you intend to.
  • matching M.*i to Mississippi matches entire word
  • Add extra "?" to minimize matching: M.*?i matches just Mi
  • Mi.+?i matches Missi
  • While Mis? matches Mis, Mis?? matches just Mi (a minimal match of either zero or one "s", so it doesn't match "s" at all.)
  • Brackets []: Single character from a set [bpk]it matches bit or pit or kit 2[a-d] matches 2a or 2b or 2c or 2d [a-fA-F0-9]+ matches a hexadecimal number (i.e., a digit 0-9 or letter A-F) ^ negates a set: [^qz] matches any character EXCEPT q or z [^\d\*M-Z] matches any character EXCEPT a digit, an asterisk, or a letter from M to Z
Basic single-character matching
  • \d = digit 0-9; \D = any NON-digit char
  • \s = space, tab, return,etc.; \S = non-space
  • \w = word char (letter, number, _); \W = non-word char
  • \t = tab; \n or \r = return; \. matches period char; \+ matches plus sign; backslash in front of + . \ | ( ) [ { ^ $ * ? to match actual char
Matching one of several options
  • Use | (OR sign or pipe) if you want to match one of several possible patterns in one spot
  • (Jamie|Frank) Ciocco matches either Jamie Ciocco or Frank Ciocco
  • M[sr]\. (Jamie|Ann) Ciocco matches Mr. Jamie Ciocco or Ms. Ann Ciocco
  • (F[oi]g|Leaf) matches Fig, Fog, or Leaf
But how do I handle find/replace?
  • Problem: have a list of names on separate lines, but all names are last name first (i.e. Ciocco, Jamie)
  • .*, .* will match Ciocco, Jamie on one line and Clark, George on the next--
  • but what do you put in the replace box to change to Jamie Ciocco & George Clark, respectively?
Backreferences: the Missing Link!
  • Use parenthesis to "save" pattern-matched text to use in find & replace
  • Example: (.*), (.*) matches Ciocco, Jamie and "saves" Ciocco as \1 and Jamie as \2
  • Find/replace (.*), (.*) with \2 \1 changes Ciocco, Jamie to Jamie Ciocco
  • Backreferences count LEFT parens \1, \2, \3.. from left to right: in ((foot)ball) \2 is foot and \1 is football
BBEdit/TextWrangler Text Factories
  • String together multiple Find/replace tasks, along with other commands for modifying & cleaning up a text document
  • Save multiple complex find/replace patterns for later use
  • Corresponding program feature on Windows?
Multi-file search & replace
  • Use this feature to apply find/replace tasks to many documents at once; can auto-save
  • Can I completely destroy a whole site’s worth of files with one poorly written pattern?
  • Short answer: yes.
  • Long answer: please don't do that.
  • PCREs are powerful tools, use with caution; test with samples, check results, back up early and often.
Continue on to Code Yellow: Lingo & JavaScript Section -->