Understanding Regular Expressions: A Beginner's Guide (Part 2)

Understanding Regular Expressions: A Beginner's Guide (Part 2)

Advanced searching with flags

Regular expressions have optional flags that allow for functionality like global searching and case-insensitive searching. These flags can be used separately or together in any order, and are included as part of the regular expression.

In this article, we will explore what are flags in the world of regular expressions and how they can be used to modify the searching behavior of given patterns. Let's dive right into it.

Include a flag with the regular expression

const re = /pattern/flags; or const re = new RegExp("pattern", "flags");

Flags are an integral part of a regular expression. They cannot be added or removed later.

const re = /\w+\s/g; 
const str = "fee fi fo fum"; 
const myArray = str.match(re); 
console.log(myArray); 
// output: ["fee ", "fi ", "fo "]`

You could replace the line: const re = /\w+\s/g; with: const re = new RegExp("\\w+\\s", "g"); and get the same result.

To give multiple flags to a regex, we write them one after another (without any spaces or other delimiters).

For example, if we were to give both the flags i and g to the regex /a/, we'd write /a/ig (or equivalently /a/gi, since the order doesn't matter).

The order in which flags appear doesn't matter - flags only modify the behavior of searching and so putting one before the other doesn't make any difference whatsoever.

Find More Than the First Match

The first and foremost flag we shall explore is g. The flag g stands for 'global' — or more specifically, 'global searching'. It serves to make an expression look for all its matches, rather than stopping at the first one.

By default, when a regex engine finds the first match for a given pattern in a given test string, it terminates right at that point without looking any further. To modify this eager nature of regexes, we can use g.

Let's say we have two expressions /cats/ and /cats/g and our string is "cats love cats".

The first expression (/cats/, without the g flag) would match only the first word 'cats' cats love cats. In contrast, the second expression (/cats/g, with the g flag) would match both 'cats' cats love cats.

const re =  /cats/
const str = "cats love cats"; 
console.log(str.match(re)); 
// output: 'cats'
const re =  /cats/g
const str = "cats love cats"; 
console.log(str.match(re)); 
// output: 'cats, cats'

Ignore Case While Matching

The second most important flag in the world of regular expressions is the i flag, where the 'i' stands for ignore casing.

As the name suggests, the i flag makes an expression look for its matches while ignoring character casing. That is, with the flag set, a lowercase/uppercase character in the pattern matches both lowercase as well as uppercase characters in the string.

const re =  /hello/gi
const str = "hello, Hello and HeLlo"; 
console.log(str.match(re)); 
// output: 'hello, Hello and HeLlo'

Match everything

A fairly recent introduction to the list of flags in JavaScript's regular expressions is that of s.

The flag s means 'dot all'. That is, it makes the . dot character (technically referred to as the wildcard character) match everything, even newlines.

In other words, with the s flag set, the dot matches all possible characters.

By default, the dot character in a regular expression matches everything, but newline characters. To get it to match newline characters as well, we are given the s flag.

const re =  /.+/g
var str = "Content flows\ndownward and\ndownward";
console.log(str.match(re)); 
// output:'Content flows', 'downward and', 'downward'
const re =   /.+/gs
var str = "Content flows\ndownward and\ndownward";
console.log(str.match(re));  
//output: 'Content flows\ndownward and\ndownward'

Multiline mode

The flag m stands for multiline mode and serves to make the boundary tokens ^ and $ match the beginning and end of each line.

By default, the ^ and $ characters in an expression match the beginning and ending boundaries of a given test string. But with the m flag in place, they instead do this for every line in the string.

let str = '1st place: Winnie 
2nd place: Piglet 
3rd place: Eeyore'; 
console.log( str.match(/^\d/g) );

// output: "1"

let str = '1st place: Winnie 
2nd place: Piglet 
3rd place: Eeyore';
console.log( str.match(/^\d/gm) ); 

// output: "1", "2, "3"

Searching a string from a custom position

Oftentimes, we might want an expression to start its searching routine, within a given test string, from an index other than 0. In other words, we might want to search for matches in the string from a custom position, like 2, 3, 4 and so on.

This can be accomplished using the y flag. The y flag stands for sticky searching. It makes an expression search from the position specified in its lastIndex property.

Without changing the lastIndex property on an expression that has the y flag set, makes the flag useless - searching would begin at the default index 0.

const str = "Cats love cats, and we love cats."
console.log( str.match(/cats/ig ) ); 

// output: "Cats", "cats", "cats"


const str = 'Cats love cats, and we love cats.';
const regex = /cats/igy;
regex.lastIndex = 10;
regex.test(str); 
// output: true

Generate indices for substring matches

The d flag indicates that the result of a regular expression match should contain the start and end indices of the substrings of each capture group. It does not change the regex's interpretation or matching behavior in any way, but only provides additional information in the matching result.

This flag primarily affects the return value of exec(). If the d flag is present, the array returned by exec() has an additional indices property as described in the exec() method's return value. Because all other regex-related methods (such as String.prototype.match()) call exec() internally, they will also return the indices if the regex has the d flag.

const str1 = "foo bar foo";
const regex1 = /foo/dg;

console.log(regex1.hasIndices); 
// output: true

console.log(regex1.exec(str1).indices[0]); 
// output: [0, 3]
console.log(regex1.exec(str1).indices[0]); 
// output: [8, 11]

const str2 = "foo bar foo";
const regex2 = /foo/;

console.log(regex2.hasIndices); 
// output: false
console.log(regex2.exec(str2).indices); 
// output: undefined

Using Unicode regular expressions

The "u" flag is used to create "unicode" regular expressions; that is, regular expressions which support matching against unicode text. This is mainly accomplished through the use of Unicode property escapes, which are supported only within "unicode" regular expressions.

let str = "A ბ ㄱ"; 
console.log(str.match(/\p{L}/gu)); 
// output: "A", "ბ", "ㄱ" 

console.log(str.match(/\p{L}/g)); 
// output: null

Conclusion

FlagDescription
gGlobal search.
iCase-insensitive search.
sAllows . to match newline characters.
mAllows ^ and $ to match newline characters.
yPerform a "sticky" search that matches starting at the current position in the target string.
dGenerate indices for substring matches.
u"Unicode"; treats a pattern as a sequence of Unicode code points.

Thank you for your time. I hope you found it useful. ❤️

If you enjoyed this article and want to be the first to know when I post the next part, you can follow me on Twitter @habibawael02 or here at Habiba Wael.