Share @ LinkedIn Facebook  regular_expression, re, pattern_matching
Regular Expression (re)

Overview

re module provides almost the same regular expression support as that of Perl. It lets you specify a pattern that can be checked against other strings that match that pattern.

Python re supports Unicode as well as 8-bit strings however both can not be intermixed when doing operations like search, replacement, etc.

Regular expressions use backslash('\') to allow special characters like '\', '\n', '\t' etc to be used without interpreting their special meaning by Python. Developers can use raw python strings ('r') to avoid special interpretations of characters in a string.

In [1]:
import re
  • re.compile(pattern, flags=0) - It compiles pattern and generates regular expression object -Pattern which then can be used for matching, searching other strings, etc. It's a wise decision to compile regular expression using this function if you are going to reuse it many times.
In [2]:
compiled_regex = re.compile(r'\W+') ## Matches one or more occurances of any word.
type(compiled_regex)
Out[2]:
_sre.SRE_Pattern
  • re.search(pattern,string,flags=0) - It scans through string looking for pattern and at first match it returns match object else returns None if no match found.
In [3]:
## Below strings looks for first occurance of cat in supplied string and returns match object which has match location & other details 
match_obj = re.search(r'cat', 'dog catches cat, cat runs behind rat and it goes like cats & dogs')
print(type(match_obj))
match_obj.start(), match_obj.end()
<class '_sre.SRE_Match'>
Out[3]:
(4, 7)
  • re.match(pattern,string,flags=0) - It maches pattern at the begining of string and returns match object if match found else returns None
In [4]:
## First one returns None because cat is not at begining and 2nd one returns match object because patters occurs at begining.
match_obj1 = re.match(r'cat', 'dog catches cat, cat runs behind rat and it goes like cats & dogs')
match_obj2 = re.match(r'cat', 'cat runs behind rat, dog goes behind cat and it goes like cats & dogs')
match_obj1, match_obj2
Out[4]:
(None, <_sre.SRE_Match object; span=(0, 3), match='cat'>)
  • re.split(pattern,string,maxsplit=0,flags=0) - It splits string by occurrences of pattern in it and returns list. If maxsplit is specified then only that many elements are returned in a list and the remaining matching string is returned as of the last element of a list.
In [5]:
## Make a not of last split list. It returned 3 elements (2 because of maxsplit and 1 more remaining string).
print(re.split(r'cat', 'dog catches cat, cat runs behind rat and it goes like cats & dogs'))
print(re.split(r'cat', 'cat runs behind rat, dog goes behind cat and it goes like cats & dogs'))
print(re.split(r'cat', 'cat runs behind rat, dog goes behind cat and it goes like cats & dogs', maxsplit=2))
['dog ', 'ches ', ', ', ' runs behind rat and it goes like ', 's & dogs']
['', ' runs behind rat, dog goes behind ', ' and it goes like ', 's & dogs']
['', ' runs behind rat, dog goes behind ', ' and it goes like cats & dogs']
  • re.fullmatch(pattern,string,flags=0) - If whole string matches pattern then returns match object else returns None.
In [6]:
print(re.fullmatch(r'cat', 'cat runs behind rat, dog goes behind cat and it goes like cats & dogs'))
print(re.fullmatch(r'\w+', 'cat')) ## Matches one of more occurance of word containing any characters at begining
print(re.fullmatch(r'\w+', '!all cat'))
None
<_sre.SRE_Match object; span=(0, 3), match='cat'>
None
  • re.findall(pattern,string,flags=0) - Returns list of all strings matching particular pattern in string.
  • re.finditer(pattern,string,flags=0) - Same as above method but returns iterator of match objects
In [7]:
print(re.findall(r'cat', 'cat runs behind rat, dog goes behind cat and it goes like cats & dogs'))
print(re.findall(r'\w+', 'cat runs behind rat, dog goes behind cat and it goes like cats & dogs'))
['cat', 'cat', 'cat']
['cat', 'runs', 'behind', 'rat', 'dog', 'goes', 'behind', 'cat', 'and', 'it', 'goes', 'like', 'cats', 'dogs']
In [8]:
## Matches iterator of match objects which will be matching pattern in string passed.
result = re.finditer(r'\w+', 'cat runs behind rat, dog goes behind cat and it goes like cats & dogs')
for obj in result:
    print(obj)
<_sre.SRE_Match object; span=(0, 3), match='cat'>
<_sre.SRE_Match object; span=(4, 8), match='runs'>
<_sre.SRE_Match object; span=(9, 15), match='behind'>
<_sre.SRE_Match object; span=(16, 19), match='rat'>
<_sre.SRE_Match object; span=(21, 24), match='dog'>
<_sre.SRE_Match object; span=(25, 29), match='goes'>
<_sre.SRE_Match object; span=(30, 36), match='behind'>
<_sre.SRE_Match object; span=(37, 40), match='cat'>
<_sre.SRE_Match object; span=(41, 44), match='and'>
<_sre.SRE_Match object; span=(45, 47), match='it'>
<_sre.SRE_Match object; span=(48, 52), match='goes'>
<_sre.SRE_Match object; span=(53, 57), match='like'>
<_sre.SRE_Match object; span=(58, 62), match='cats'>
<_sre.SRE_Match object; span=(65, 69), match='dogs'>
  • re.sub(pattern,repl,string,count=0, flags=0) - Returns new string by replacing all matched pattern strings with repl string. If no match then string is returned untouched.
  • re.subn(pattern,repl,string,count=0, flags=0) - Same as above method but returns tuple with first element as modified string and 2nd element as number of replacements made in original string.
In [9]:
print(re.sub(r'cat', 'kangaroo','cat runs behind rat, dog goes behind cat and it goes like cats & dogs'))
print(re.subn(r'cat', 'kangaroo','cat runs behind rat, dog goes behind cat and it goes like cats & dogs'))
kangaroo runs behind rat, dog goes behind kangaroo and it goes like kangaroos & dogs
('kangaroo runs behind rat, dog goes behind kangaroo and it goes like kangaroos & dogs', 3)
  • re.escape(pattern) - Escapes special characters in pattern which then can be used inside other functions mentioned above.
  • re.purge() - Clears regular expression cache.
  • re.error(msg,pattern=None,pos=None) - Use can pass customised error msg when any issue occurs during compilation, matching or any other operations mentioned above.It raises error if pattern is also not valid regular expression.
In [10]:
re.escape('\n Lets check \t') ## It escapes special characters like space, \t, \n so that user can avoid some work.
Out[10]:
'\\\n\\ Lets\\ check\\ \\\t'
In [11]:
try:
    re.match('[+*','template matching')
except re.error as err:
    print('Invalid Regular Expression : '+err.msg+'. Match operation failed.')
Invalid Regular Expression : unterminated character set. Match operation failed.
In [12]:
custom_err = re.error('Custom user-defined error message')
try:
    try:
        re.compile('*+')
    except re.error as err:
        raise custom_err
except re.error as err:
    print(err.msg)
Custom user-defined error message


Sunny Solanki  Sunny Solanki