Split Captures

Apr 10th, 2008 — 4:13 AM

I was working on some RMagick magic the other day and used String#split to parse the width and height from a string and was a little surprised by the outcome:

# where string = "800x600" 
width, height = string.split(/(x|or)/)
# returned => ["800", "x", "600"]

For those of you who might also be surprised as well, the parentheses in my regex there are making what’s known as a capture or backreference. In my case this was actually beneficial as it saved me having to run another regex on the string to determine in the “operator” used was “x” or “or”. The Ruby docs for String#split don’t mention this behavior and merely state that “[if] pattern is a Regexp, str is divided where the pattern matches. Whenever the pattern matches a zero-length string, str is split into individual characters.” but I wouldn’t call this a bug at all as once you think about it it makes perfect sense what is happening. Call it a pleasantly surprising feature.

Oh, by the way. Had I not wanted the regex to capture like that I could either use /(?:x|or)/, which tells the regex engine not to create a backreference, or simply /x|or/, a solution [thanks apeiros!] that I was not aware of. For whatever reason I thought that alternation always had to occur within parentheses.

Filed under: stunt ruby

1 comment

  • Tim said
    I'm sure this behavior is inherited from Perl, either on purpose or implicitly via PCRE. Perl's "split" behaves this way, and it's documented there: If the PATTERN contains parentheses, additional list elements are created from each matching substring in the delimiter. split(/([,-])/, "1-10,20", 3); produces the list value (1, ’-’, 10, ’,’, 20)

Leave a Comment

Lucky Sneaks is designed and managed by Lucky Sneaks
All content © Lucky Sneaks 2008
Lucky Sneaks is Lucky Sneaks