Regular Expressions in Python
- test if a string matches a regular expression
- find all matches in a string
- find & replace in a string
- compile: creates a regular expression object
- match: match the regular expression at the beginning of the string
- search: search for matches in the whole string
- findall: returns all non-overlapping matches
- finditer: returns an iterator to iterate through matches
- sub: can be used to replace a match in a string
Let's start with compiling our first regular expression object.
regex = re.compile('34(56)+')
- re.IGNORECASE: perform a case-insensitive matching
- re.LOCALE: make \w, \W, \b, \B, \s and \S dependent on the current locale
- re.MULTILINE: perform a multi-line matching which means that ^ and $ do not stand for the beginning and the end of the whole string but for the beginning and the end of a single line
- re.DOTALL: make the . match any character including a newline
- re.UNICODE: make \w, \W, \b, \B, \s and \S dependent on the Unicode character properties database
- re.VERBOSE: this allows you to create more readable regular expression. Whitespace in the pattern is ignored and you can create comments (does not change the matching behaviour)
If you want to your regular expression to ignore case and match multiline then you can define it like this:
regex_im = re.compile('34(56)+', re.MULTILINE | re.IGNORECASE)
TEST IF A PATTERN MATCHES
There are several ways how you can test if a string matches your pattern but the easiest one would be calling the method match() on your regular expression object. This method will return a MatchObject instance if the string matches the pattern at the beginning. Here is an example:
This will return a match object with the position 0. Note that match() will not find matches that are not at the beginning of the string. For these cases you can use the method search():
Find all matches in a string
Now you know how to test if a string matches a pattern (using match()) and how to find the first match in a string (using search()). But what if you want to find every match in a string? For example you want to find every hyperlink in an html document. To get every match you can use the method finditer() which returns an iterator that yields the corresponding MatchObjects for each match. The following snippet shows you a piece of code that prints the starting position for each match.
find and replace in a string
Now that you have found every match in a string you may want to manipulate it. You can use the method sub() to do that. It takes two mandatory arguments: repl and string. repl can either be a string or a function. If it is a string then every match will be replaced by this string. If it is a function that this function will be called for each match. The function then has to return a string that will be inserted instead of the match. Look at the following snippet for an example:
In the second example the repl parameter is a function call replace_by_sum. This function loops through the characters of the match (in this case all numbers) and adds them. It will then return this sum inside parantheses. The first call just replaced the match with the string 'MATCH'.
To get started with regular expressions in Python you need to know the following things:
- You can create regular expressions objects using re.compile('regular expression', FLAG | FLAG | ...)
- You can check if there is a match by calling match(string) or search(string) on the RegExp object
- re.finditer() will return an iterator for the MatchObjects
- Using sub() you can search and replace inside a string
If you have feedback or found an error please comment below or tweet at me. I am happy to update this blog post in the future. In the upcoming weeks I will release more blog posts about regular expressions in other programming languages.
GIST with example
The following Gist contains the examples that were shown in this post.