Understanding Regular Expressions and How to Write Them
Regular expressions, commonly known as RegEx, are powerful tools used for pattern matching within strings.
They allow you to search, extract, and manipulate text based on specific patterns, making them incredibly useful for data validation, text parsing, and
string manipulation.
How Regex Works
RegEx uses a sequence of characters to define a search pattern. This pattern can match simple text sequences or more complex string structures. Here's a breakdown of how regex works:
Literal Characters
These are the simplest forms of patterns, matching exact sequences
of characters.
The pattern cat will match the string "cat" in the text "The cat sat on the mat".
Metacharacters
These are special characters with specific meanings that allow for more complex patterns.
The dot . metacharacter matches any single character except a newline.
The pattern c.t will match "cat", "cut", "cot", etc.
Character Classes
Defined using square brackets [ ] they match any one character within the brackets.
The pattern [aeiou] will match any vowel.
The pattern c[aeiou]t will match "cat", "cot", "cut", etc.
Quantifiers
These specify how many times a character or group of characters should be matched.
The asterisk * matches zero or more occurrences of the preceding element.
The pattern ca*t will match "ct", "cat", "caaat", etc.
Anchors
These are used to match positions within a string rather than actual characters.
The caret ^ matches the start of a string, and the dollar sign $ matches the end of a string.
The pattern ^cat will match "cat" only if it is at the beginning of the string.
Groups and Alternation
Parentheses ( ) create groups, and the pipe | acts as an OR operator.
The pattern (cat|dog) will match either "cat" or "dog".
The pattern (c|d)og will
match "cog" or "dog".
Regex Tokens
Symbols used to write regex patterns are called tokens. These tokens match a variety of characters. Here are some useful tokens to get you started:

Source: regex101.com Check them out for more tokens and support.
Writing Regular Expressions
Writing effective regular expressions involves understanding the specific patterns you need to match. Here's a step-by-step guide to writing your own regex:
Identify the Text Pattern: Determine the specific text or structure you need to find. For example, an email address or a date.
Start Simple: Begin with a basic pattern and gradually add complexity.
Example: To match a basic email address, start with the pattern for a sequence of word characters [\w]+
Use Metacharacters and Classes: Incorporate special characters and classes to match more complex patterns.
Example: An email address pattern can be [\w.-]+@[\w.-]+\.\w+
This pattern matches an email address format, consisting of one or more word characters, dots, or hyphens, followed by an @ symbol, then one or more word characters, dots, or hyphens, a dot, and finally one or more word characters.
Test and Refine: Use regex testing tools like regex101 to test your patterns against various strings and refine them as needed.
Excel REGEX Functions
Now that you're familiar with how regex works, let's try out the new Excel REGEX functions with some practical examples.
Excel REGEXTEST Function
The REGEXTEST function allows you to determine if a text string matches a specified pattern. This function returns TRUE if the pattern is found in the text, and FALSE otherwise.
Syntax
=REGEXTEST(text, pattern, [ignore_case])
text: The input string to test.
pattern: The regular expression pattern to match.
[ignore_case]: Optional. If TRUE (default), the match is case-insensitive.
Example 1: Test if a text string is present:
=REGEXTEST("Hello World", "world")
This will return TRUE because the function ignores case by default.
Example 2: Validate whether a string is a correctly formatted email address with the pattern ^[\w.-]+@[\w.-]+\.\w+$
Explanation: The pattern is designed to match a valid email address format with the following structure:
- Start of the string ^ : Ensures the pattern matches from the beginning of the string.
- Username part [\w.-]+ : Matches one or more characters that are either word characters (letters, digits, underscores), dots, or hyphens.
- @ character: Matches the literal "@"
symbol separating the username and domain parts.
- Domain name part [\w.-]+ : Matches one or more characters that are either word characters, dots, or hyphens.
- Dot character \. : Matches the literal dot before the top-level domain.
- Top-level domain \w+ : Matches one or more word characters, representing the domain suffix.
- End of the string $ : Ensures the pattern matches up to the end of the string.
Formula
=REGEXTEST("user@example.com", "^[\w.-]+@[\w.-]+\.\w+$")
Returns: TRUE
Excel REGEXEXTRACT Function
The REGEXEXTRACT function extracts substrings that match a specified pattern
from the input text.
Syntax
=REGEXEXTRACT(text, pattern, [return_mode], [ignore_case])
text: The input string.
pattern: The regular
expression pattern to extract.
[return_mode]: Optional. Determines the return format:
0: First match (default)
1: Multiple matches as an
array
2: Groupings for the first match as an array
[ignore_case]: Optional. If TRUE (default), the match is case-insensitive.
Example 1: Extract an email address with the pattern [\w.-]+@[\w.-]+\.\w+
Explanation: The pattern matches an email address format, where:
- It starts with one or more word characters, dots, or hyphens.
- Followed by the @ symbol.
- Then has one or more word characters, dots, or hyphens.
- Followed by a dot.
- Finally ending with one or more word characters.
Formula
=REGEXEXTRACT("Contact us at support@example.com", "[\w.-]+@[\w.-]+\.\w+")
This will return "support@example.com".
Example 2: Extract
the domain name from a URL with the pattern (?<=//)(?:www\.)?([^/]+)
Explanation:
- (?<=//) : This is a positive lookbehind assertion. It ensures that the match is preceded by // without including // in the
result.
- (?:www\.)? : Non-capturing group to optionally match "www.".
- ([^/]+) : Capturing group to match the domain name. It matches one or more characters that are not a slash.
Formula
=REGEXEXTRACT("https://www.MyOnlineTrainingHub.com/blog", "(?<=//)(?:www\.)?([^/]+)")
Returns: www.MyOnlineTrainingHub.com
Excel REGEXREPLACE Function
The REGEXREPLACE function replaces substrings matching a pattern with a replacement string.
Syntax
=REGEXREPLACE(text, pattern, replacement, [occurrence], [ignore_case])
text: The input string.
pattern: The regular expression pattern to match.
replacement: The replacement text.
[occurrence]: Optional. Determines which occurrences are replaced:
0: All occurrences (default)
n: nth occurrence from the start
-n: nth occurrence from the end
[ignore_case]:
Optional. If TRUE (default), the match is case-insensitive.
Example 1: Redact the first 6 digits of a phone number using the pattern \d{3}-\d{3}
Explanation: The pattern matches a string of
exactly three digits followed by a hyphen and then exactly three digits.
- \d : Matches any digit (0-9). It is a shorthand character class for numeric digits.
- {3} : This quantifier specifies that the preceding element (a digit, in this case) must occur exactly three times.
So, \d{3} matches exactly three digits in a row.
- - : Matches a literal hyphen (dash) character. It is not a special character in this context but is used to match the hyphen itself in the string.
- \d : Again, matches any digit (0-9).
- {3} : This quantifier specifies that the preceding element (a digit) must occur exactly three times. So, \d{3} matches exactly three digits in a row.
Formula
=REGEXREPLACE("My phone number is 123-456-7890",
"\d{3}-\d{3}", "XXX-XXX")
This will return "My phone number is XXX-XXX-7890".
Example 2: Replace the first 5 digits of a Social Security Number (SSN) with asterisks using the pattern \d{3}-\d{2}
Explanation: The pattern \d{3}-\d{2} matches a string of exactly three digits followed by a hyphen and then exactly two digits.
Formula
=REGEXREPLACE("My SSN is 123-45-6789", "\d{3}-\d{2}",
"*-")
Returns: "My SSN is *--6789"
Conclusion
As you can see, understanding and mastering regex can significantly enhance your ability to manipulate and analyse text in Excel. With Excel's new REGEX functions, you can seamlessly integrate these powerful patterns into your workflows, making text processing
tasks more efficient and effective.
However, if all this makes your head hurt, then you can do similar text manipulation and much more with Power Query, which is a lot more point and click. Get started with Power Query here.