Mar 13, 2014

Matching Hex characters in a Regex

I've noticed a common problem with regular expressions and Hex Characters, so I thought I'd blog about it. The most common way to regex a UUID, or SHA1 or some other hex encoded binary value is this (and I've seen this in Perl libraries and StackOverflow answers).

[a-f0-9] or [A-F0-9]

Neither of these are correct as Hex is case insensitive and both of these regex's are. Hex is most commonly lowercase (unless you're Data::UUID), but that's an aesthetic, not a requirement. The best way to match Hex is using a POSIX character class.

[[:xdigit:]] or \x

which matches this in a more readable manner, and intent driven manner

[A-Fa-f0-9]

as a side note it's this in a regex string in Java

"\\p{XDigit}"

1 comment:

  1. require 5.012; qr/\p{XDigit}/ works. See http://p3rl.org/perlrecharclass

    ReplyDelete

No trolling, profanity, or flame wars :: My Blog, my rules! No crying or arguing about them.