Updated On : Nov-22,2019 Time Investment : ~15 mins

Regular Expression in Python using "re" Module

re module provides almost the same regular expression support as that of Perl. It lets you specify a pattern that can be checked against other strings that match that pattern.

Python re supports Unicode as well as 8-bit strings however both can not be intermixed when doing operations like search, replacement, etc.

Regular expressions use backslash('\') to allow special characters like '\', '\n', '\t' etc to be used without interpreting their special meaning by Python. Developers can use raw python strings ('r') to avoid special interpretations of characters in a string.

import re
  • re.compile(pattern, flags=0) - It compiles pattern and generates regular expression object -Pattern which then can be used for matching, searching other strings, etc. It's a wise decision to compile regular expression using this function if you are going to reuse it many times.
compiled_regex = re.compile(r'\W+') ## Matches one or more occurances of any word.
type(compiled_regex)
_sre.SRE_Pattern
  • re.search(pattern,string,flags=0) - It scans through string looking for pattern and at first match it returns match object else returns None if no match found.
## Below strings looks for first occurance of cat in supplied string and returns match object which has match location & other details 
match_obj = re.search(r'cat', 'dog catches cat, cat runs behind rat and it goes like cats & dogs')
print(type(match_obj))
match_obj.start(), match_obj.end()
<class '_sre.SRE_Match'>
(4, 7)
  • re.match(pattern,string,flags=0) - It maches pattern at the begining of string and returns match object if match found else returns None
## First one returns None because cat is not at begining and 2nd one returns match object because patters occurs at begining.
match_obj1 = re.match(r'cat', 'dog catches cat, cat runs behind rat and it goes like cats & dogs')
match_obj2 = re.match(r'cat', 'cat runs behind rat, dog goes behind cat and it goes like cats & dogs')
match_obj1, match_obj2
(None, <_sre.SRE_Match object; span=(0, 3), match='cat'>)
  • re.split(pattern,string,maxsplit=0,flags=0) - It splits string by occurrences of pattern in it and returns list. If maxsplit is specified then only that many elements are returned in a list and the remaining matching string is returned as of the last element of a list.
## Make a not of last split list. It returned 3 elements (2 because of maxsplit and 1 more remaining string).
print(re.split(r'cat', 'dog catches cat, cat runs behind rat and it goes like cats & dogs'))
print(re.split(r'cat', 'cat runs behind rat, dog goes behind cat and it goes like cats & dogs'))
print(re.split(r'cat', 'cat runs behind rat, dog goes behind cat and it goes like cats & dogs', maxsplit=2))
['dog ', 'ches ', ', ', ' runs behind rat and it goes like ', 's & dogs']
['', ' runs behind rat, dog goes behind ', ' and it goes like ', 's & dogs']
['', ' runs behind rat, dog goes behind ', ' and it goes like cats & dogs']
  • re.fullmatch(pattern,string,flags=0) - If whole string matches pattern then returns match object else returns None.
print(re.fullmatch(r'cat', 'cat runs behind rat, dog goes behind cat and it goes like cats & dogs'))
print(re.fullmatch(r'\w+', 'cat')) ## Matches one of more occurance of word containing any characters at begining
print(re.fullmatch(r'\w+', '!all cat'))
None
<_sre.SRE_Match object; span=(0, 3), match='cat'>
None
  • re.findall(pattern,string,flags=0) - Returns list of all strings matching particular pattern in string.
  • re.finditer(pattern,string,flags=0) - Same as above method but returns iterator of match objects
print(re.findall(r'cat', 'cat runs behind rat, dog goes behind cat and it goes like cats & dogs'))
print(re.findall(r'\w+', 'cat runs behind rat, dog goes behind cat and it goes like cats & dogs'))
['cat', 'cat', 'cat']
['cat', 'runs', 'behind', 'rat', 'dog', 'goes', 'behind', 'cat', 'and', 'it', 'goes', 'like', 'cats', 'dogs']
## Matches iterator of match objects which will be matching pattern in string passed.
result = re.finditer(r'\w+', 'cat runs behind rat, dog goes behind cat and it goes like cats & dogs')
for obj in result:
    print(obj)
<_sre.SRE_Match object; span=(0, 3), match='cat'>
<_sre.SRE_Match object; span=(4, 8), match='runs'>
<_sre.SRE_Match object; span=(9, 15), match='behind'>
<_sre.SRE_Match object; span=(16, 19), match='rat'>
<_sre.SRE_Match object; span=(21, 24), match='dog'>
<_sre.SRE_Match object; span=(25, 29), match='goes'>
<_sre.SRE_Match object; span=(30, 36), match='behind'>
<_sre.SRE_Match object; span=(37, 40), match='cat'>
<_sre.SRE_Match object; span=(41, 44), match='and'>
<_sre.SRE_Match object; span=(45, 47), match='it'>
<_sre.SRE_Match object; span=(48, 52), match='goes'>
<_sre.SRE_Match object; span=(53, 57), match='like'>
<_sre.SRE_Match object; span=(58, 62), match='cats'>
<_sre.SRE_Match object; span=(65, 69), match='dogs'>
  • re.sub(pattern,repl,string,count=0, flags=0) - Returns new string by replacing all matched pattern strings with repl string. If no match then string is returned untouched.
  • re.subn(pattern,repl,string,count=0, flags=0) - Same as above method but returns tuple with first element as modified string and 2nd element as number of replacements made in original string.
print(re.sub(r'cat', 'kangaroo','cat runs behind rat, dog goes behind cat and it goes like cats & dogs'))
print(re.subn(r'cat', 'kangaroo','cat runs behind rat, dog goes behind cat and it goes like cats & dogs'))
kangaroo runs behind rat, dog goes behind kangaroo and it goes like kangaroos & dogs
('kangaroo runs behind rat, dog goes behind kangaroo and it goes like kangaroos & dogs', 3)
  • re.escape(pattern) - Escapes special characters in pattern which then can be used inside other functions mentioned above.
  • re.purge() - Clears regular expression cache.
  • re.error(msg,pattern=None,pos=None) - Use can pass customised error msg when any issue occurs during compilation, matching or any other operations mentioned above.It raises error if pattern is also not valid regular expression.
re.escape('\n Lets check \t') ## It escapes special characters like space, \t, \n so that user can avoid some work.
'\\\n\\ Lets\\ check\\ \\\t'
try:
    re.match('[+*','template matching')
except re.error as err:
    print('Invalid Regular Expression : '+err.msg+'. Match operation failed.')
Invalid Regular Expression : unterminated character set. Match operation failed.
custom_err = re.error('Custom user-defined error message')
try:
    try:
        re.compile('*+')
    except re.error as err:
        raise custom_err
except re.error as err:
    print(err.msg)
Custom user-defined error message
Sunny Solanki  Sunny Solanki

YouTube Subscribe Comfortable Learning through Video Tutorials?

If you are more comfortable learning through video tutorials then we would recommend that you subscribe to our YouTube channel.

Need Help Stuck Somewhere? Need Help with Coding? Have Doubts About the Topic/Code?

When going through coding examples, it's quite common to have doubts and errors.

If you have doubts about some code examples or are stuck somewhere when trying our code, send us an email at coderzcolumn07@gmail.com. We'll help you or point you in the direction where you can find a solution to your problem.

You can even send us a mail if you are trying something new and need guidance regarding coding. We'll try to respond as soon as possible.

Share Views Want to Share Your Views? Have Any Suggestions?

If you want to

  • provide some suggestions on topic
  • share your views
  • include some details in tutorial
  • suggest some new topics on which we should create tutorials/blogs
Please feel free to contact us at coderzcolumn07@gmail.com. We appreciate and value your feedbacks. You can also support us with a small contribution by clicking DONATE.


Subscribe to Our YouTube Channel

YouTube SubScribe

Newsletter Subscription