A (Very) Gentle Introduction to Regex

It’s been my experience that working with Regex makes everyone’s head hurt. No one wants to have to look at ^(19|20)\d\d[- /.](0[1-9]|1[012])[- /.](0[1-9]|[12][0-9]|3[01])$ and figure out what it means!

But in spite of that, Regex is a very powerful tool, and it’s good to know how to use it, even if (like most people) you’re not an expert. This post will serve as a very gentle introduction to Regex, so that when you encounter it in your testing you’ll feel more comfortable with it.

The first thing you should know about Regex is that it stands for “Regular Expression”. It’s simply a sequence of characters that define a search pattern. It’s very useful for doing things like editing a string or checking to see if a phone number, date, or postal code fits the accepted pattern.

The second thing you should know about Regex is that it’s a lot easier to use when you have a Regex tester available! I like to use regexpal.com, but there are many other free testers available on the Web.

Regex varies slightly depending on what language you are using, but the basic building blocks of Regex are the same in each. Here are ten different Regex symbols that will help you get started:

^ : The carat symbol indicates that you want to match the beginning of a word. For example, if you were using a pattern that started with ^ball, you could match the word ball or the word balloon, but you could not match the word football, because it doesn’t begin with ball.

$: The dollar symbol indicates that you want to match the end of a word. So if you were using a pattern that ended with ball$, you could now match the word football, but you couldn’t match the word balloon, because it doesn’t end with ball.

. : The period symbol will match any character. You can use this when one of the characters in a string is going to vary. So if you had the pattern foo.ball, you could match football or foosball.

*: The asterisk symbol indicates that the character should be matched one or more times. In the pattern fo*tball, the letter o can be matched one or more times. So with this pattern, you could match fotball, or football, or even foooooooootball.

\d: The slash and d symbol matches any numeric digit. The pattern football\d will match football1, football2, football3, and so on, but not football or football!.

\w: The slash and w symbol matches any character from the basic Latin alphabet. So if you were looking for a pattern of 12345\w, it would match 12345a, 12345b, 12345z, and so on, but wouldn’t match 123456 or 12345!.

\s: The slash and s matches a space. If you had a pattern of foot\sball, it would match foot ball but not football.

[ ]: The square bracket symbols indicate a character set. So if you wanted to match any number from 1 to 5, you could use [12345]. A pattern of football[12345] would match football1, but not football6.

|: The pipe character is an either/or pattern. A Regex pattern of cat|dog will match cat, and will also match dog.

( ): The parentheses group pattern items together, the same way they do in mathematical expressions. Let’s say you were trying to find a match for November or December, but not for September. You couldn’t just use a Regex pattern of ember, because that would match all three months. But you could use (Nov|Dec)ember; using the parentheses combined with the pipe character shows that the month could either have Nov or Dec, and then should continue with ember.

I’ve kept these examples very simple, because there is so much to Regex that you could spend months learning it, and it is very easy to get confused! But these commonly used symbols should be enough to get you feeling a bit more comfortable with it. Take some time to play around with a Regex tester to practice what you’ve learned, and if you’d like to learn more, try an interactive tutorial like regexone.com. Have fun!

5 thoughts on “A (Very) Gentle Introduction to Regex

  1. Pingback: Five Blogs – 16 November 2020 – 5blogs

Comments are closed.