Blog

My Journey as a Web Developer

Regex Made Simple: A Guide to Mastering Text Pattern Matching

17/09/2023

author avatar

Regular expressions, often abbreviated as Regex, are often the based way to handle string manipulation and processing text data. We have a wealth of JavaScript string methods to help us with us, but most often it is more efficient and cleaner to do with Regex. If you've ever needed to search, extract, or replace specific patterns within strings, Regex is a skill worth adding to your toolkit. In this article, we'll explore why Regex is worth learning and provide practical examples of how it simplifies complex text manipulation tasks that might otherwise be challenging, or more convoluted, with traditional JavaScript string methods.

Regex Looks Messy

/[^a-z0-9]+/g

What does this even mean? Let's break it down and see what all the individual characters do.

//g
The two slashes indicate the start of a regular expression. Whatever we put in between the slashes will get checked. The 'g' is a flag which stands for global. This flag allows us to get multiple matches. Without it, we would just get the first match.
[]
The square brackets form a set. What you place is in these square brackets will be treated as individual characters. So if we want to find all the a's in a given string, we could use
/a/g
, but if we wanted to match a's and b's, we would need to use square brackets. To negate the contents of square brackets, a caret ^ is placed at the start.
/[^a]/g
matches anything is not an 'a'.

Returning to our initial regular expression, we can now see that we are looking for anything that is not a lowercase alphabetic character. The '+' sign captures consecutive matches. For example if a string had three dashes ('---'), and we want to replace them with a single underscore, if we did not include the plus sign, we would get a result of three underscores. By using the plus sign, it groups them together and we get the desired '_'.

Now that we understand the regular expression above, we can give it a practical use case. Let's say for example, we receive a title string, and from that string we want to form a URL.

const title = 'This is some product title: My Product is Cool'

const url = title.toLowerCase().replace(/[^a-z0-9]+/g, "-")

// result = "this-is-some-product-title-my-product-is-cool"

Complex regular expressions

/([a-z])(?!.*\1)/ig

The above regular expression checks the number matches for individual characters of the alphabet. It can be used as a pangram checker.

const alphCharsUsed = string => string.match(/([a-z])(?!.*\1)/ig)

console.log(alphCharsUsed('abcdefghijklomnpqrstuvwxyz').length)\
console.log(alphCharsUsed('abcdefghijklomnpqrstuvwxyzabcdefghj').length)

// result 26
// result 26

How exactly does this work?

([a-z])
We form a capture group with parentheses, which allows us target later. The square brackets simply match characters between a and z.

Here's where the magic begins, and it is akin to magic.

(?!)
This forms a negative lookahead assertion.
.*
Matches any number of characters (including zero) within the string.
\1
Simply references the group we made.

In summary, the regular expression matches all the characters within a-z. Then checks if they occur again and if they do, ignore them.

As you can see, with a simple line of code, albeit a jumbled mess, we can do some really powerful pattern matching.

Regex Cheatsheet

Modifiers

  • 'g' Find all matches. Removing this returns the first match only.
  • 'i' Ignore case.
  • 'm' Allow multiline matching

Character Classes

  • '.' Any character except newline.
  • '\w' Find a word.
  • '\W' Find non-word.
  • '\bx' Find a match at the beginning of a word, where x is the variable.
  • 'x\b' Find a match at the end of a word, where x is the variable.
  • '[abc]' Find any character between the brackets.
  • '[^abc]' Find any character not between the brackets.
  • '[f-k] Find characters between f and k

Quantifiers

  • 'x+' Matches strings with at least one x
  • 'x*' Matches strings that contains zero or more occurrences of x
  • 'x?' Matches strings that contain zero or one occurrences of n
  • 'x{2,3} Matches strings that contain at least two x's but not more than 3 x's

Grouping and Lookarounds

  • '(xyz)' Capture group.
  • '\1' Reference group 1
  • '(?=)' Positive lookahead. Find X where Y follows. X(?=Y)
  • '(?!)' Negative lookahead. Find X where Y does not follow. X(?!Y)
  • '(?<=)' Positive lookbehind. Find X where Y precedes it. (?<=Y)X
  • '(?<!)' Negative lookbehind. Find X where Y does not precede it. (?<!Y)X

Summary

There is a lot that can be done with regular expressions. The above is just scratching the surface. It is a formidable tool for tackling complex text manipulation challenges that often arise in development and API consumption. While JavaScript provides native string methods, regex offers a more elegant and adaptable solution. By investing time in learning regex, you'll find yourself equipped to handle a wide range of text-related tasks with efficiency and precision.

← Blog Home