vim regex revealed

I wanted to use Vim to change graduation dates into a standard format. The form requires them to enter the date as mm/yy, but to enter it in the database, I need in in the form yyyy-mm. Sometimes they leave the leading 0 out of the month. For the database, the graduation month has to be 06, 09, or 02. But they often use 05 instead of 06 (because that’s when they want to be finished?). Here’s the Vim command to do it:

:%s/0\=\([4-6]\)\/\(\d\d\)/\="20".submatch(2)."-06"/

Well, this does it for the folks graduating in June. It then has to be run separately for the other two graduation months:

:%s/0\=\([29]\)\/\(\d\d\)/\="20".submatch(2)."-0".submatch(1)/

If they pick a month other than 2, 4, 5, 6, or 9, it’s a problem. (I assume the person who said 04 was just super-eager to get out.)

Analysis

  • The :%s part starts the substitute command on all lines in the file. The rest of the command is in the form, /{pattern}/{string}/, where {pattern} is a regular expression using Vim’s very special syntax, and {string}, if you are very clever, is substitute text that includes back references to parts of the regex that were matched by {pattern}.
  • \= is vim-talk for ? or {0,1} in other regular expression syntaxes. That is, 0\= matches the possibly-missing leading zero in the month. But watch out. \= in the string part of the command means something totally different! (see below)
  • OK, Vim uses parentheses for capturing matches, just like other regular expressions. Except you have to escape them. So \([4-6]\)\/ matches and captures a 4, a 5, or a 6, and matches but does not capture the / that comes after it.
  • Vim understands escape codes for special characters, so \d\d matches the two-digit year number. (Note to self: fix this code before the year 2100.) Of course \d{2} would have worked, except that it would be written as \d\{2}. Right: escape the opening brace but not the closing brace. In case you thought there was a general principle in effect when you had to escape both the opening and closing parentheses of the capture group. (Just kidding: the escape before the closing brace may, optionally, be present.) Anyway, it’s faster to type \d\d than any of the other options.
  • Now we cross over to the substitution string, the stuff that starts after the second un-escaped /. The string starts with our old friend, \=, which no longer means ?, it means this string is special, what Vimophiles like to call a “sub-replace-expression.” In a sub-replace-expression you can put in strings enclosed in double quotes; concatenate stuff using the period as the operator; make back references to the captured matches in the pattern using submatch(n), where n is 0 for the entire match and other numbers are for the captured parts in the usual way.

Wasn’t that fun?

OK, truth in advertising: it’s not as complicated as that. Instead of a sub-replace-expression, you can just put in \1, etc in the string for the back references, without starting the string with the magic \=. June could have been written:

:%s/0\=\([4-6]\)\/\(\d\d\)/20\2-06/

But I hadn’t been able to get that to work when I started writing this, I guess because I had the pattern syntax wrong when I first tried it.