Your RegEx Cheat Sheet
Regular expressions, or Regexes, are a special kind of text string used to describe patterns in text. They’re extremely powerful tools for working with and modifying large amounts of text quickly, which is why they’re often used by developers and other professionals who need to deal with a lot of data. But they can also be intimidating! This cheat sheet will help you get started writing regex expressions and show you some helpful tricks along the way.
For example, say you wanted to find all the phrases in your text that matched a certain pattern of words (for example, “I love pizza” or “Bella went to school today”). If you want to match the phrase “I love pizza,” you would search for this regex expression: I\s+love\s+pizza. The \s part stands for any whitespace character like space and tab, and the + indicates that there can be one or more of those characters.
Now, if you wanted to find all the phrases that matched “Bella went to school today,” you would search for this regex expression: Bella\s+went\s+to\s+school\s+today. The \s part stands for any whitespace character, and + indicates that there can be one or more of those characters.
Now that we have some background information about Regexes, let’s take a look at what kinds of tasks are best suited for them.
For example, a regular expression might be used to find the position of words in a sentence or find text that is formatted with certain HTML tags. In addition, regexes can also search large amounts of data such as log files and computer folders for specific strings (e.g., names of photos). Regular expressions are also excellent at finding patterns and making replacements on large groups of data.
We’ve seen how a few examples work now; let’s look at some more!
- testRigor uses regular expressions to generate random data in addition to search and validation
- Java have Java flavor of Regex Pattern
- JavaScript has RegExp as one of its built-in objects. You may want to check out Javascript RegExp Library, which contains JavaScript regular expression functions and pre-written regexes for handling common tasks such as email validation and IP address formatting/validation
- Ruby has the StringScanner class, which provides a more complex way to find patterns in strings
- Python includes an extensive module called re
- Perl uses “Perl Regular Expressions” syntax
Writing regular expressions can be a little tricky, so you may find it helpful to check out some of the resources available on Regular Expression Tutorials and References. You might also want to read up on what makes for a good regex when using Python’s re module.
Now that you have an understanding of Regexes, see the full reference guide, including symbols, ranges, grouping, assertions, and some sample patterns:
? | 0 or 1 times |
* | 0 or more times |
+ | 1 or more times |
{3} | Exactly 3 times |
{3,} | 3 or more times |
{3,5} | No less than 3 and no more than 5 times |
Add a ? to a quantifier to make it reluctant (not greedy) like in a+?.
\s | Whitespace like spaces and tabs |
\S | Non-whitespace |
\d | Digit |
\D | Non-digit |
\w | Word character equivalent to [a-zA-Z0-9_] |
\W | Non-word character |
. | Any character except new line (\n) |
a|b | a or b |
(…) | Group things to work with multipliers |
[abc] | Any letter from the list (a or b or c) |
[^abc] | Any character that is not a or b or c |
[a-q] | A letter anywhere from a to q inclusively |
\N | Group number N that was previously found like \1. |
Ranges are inclusive.
[:upper:] | Upper case letters |
[:lower:] | Lower case letters |
[:alpha:] | All letters |
[:digit:] | Digits |
[:alnum:] | Digits and letters |
[:xdigit:] | Hexadecimal digits |
[:punct:] | Punctuation |
[:blank:] | Space and tab |
[:space:] | Blank characters |
[:cntrl:] | Control characters |
[:graph:] | Printed characters |
[:print:] | Printed characters and spaces |
[:word:] | Digits, letters and underscore |
\ | Escape following character |
\Q | Begin literal sequence |
\E | End literal sequence |
\n | New line |
\r | Return |
\t | Tab |
\v | Vertical tab |
\f | Form feed |
\0xxx | Octal character xxx |
\xhhhh | Hex character hhhh |
(X) | Regular group capturing X |
(? |
Regular group capturing X with name of the group being “name” |
(?:X) | Non-capturing group for X |
(?=X) | zero-width positive lookahead for X |
(?!X) | zero-width negative lookahead for X |
(?<=X) | zero-width positive lookbehind for X |
(?<!X) | zero-width negative lookbehind for X |
(?<X) | independent, non-capturing group for X |
^ | Start of string, or start of line in a multiline pattern |
\A | Start of string |
$ | End of string, or end of line in a multiline pattern |
\Z | End of string |
\b | Word boundary |
\B | Non-word boundary |