Regular Expression in Python with Examples | Set 1

Last Updated: 19-10-2020

Module Regular Expressions(RE) specifies a set of strings(pattern) that matches it. 
To understand the RE analogy, MetaCharacters are useful, important and will be used in functions of module re. 
There are a total of 14 metacharacters and will be discussed as they follow into functions: 

\   Used to drop the special meaning of character
    following it (discussed below)
[]  Represent a character class
^   Matches the beginning
$   Matches the end
.   Matches any character except newline
?   Matches zero or one occurrence.
|   Means OR (Matches with any of the characters
    separated by it.
*   Any number of occurrences (including 0 occurrences)
+   One or more occurrences
{}  Indicate number of occurrences of a preceding RE 
    to match.
()  Enclose a group of REs

  • Function compile() 
    Regular expressions are compiled into pattern objects, which have methods for various operations such as searching for pattern matches or performing string substitutions. 
#Module Regular Expression is imported using <strong>import</strong>().
 import re 
# compile() creates regular expression character class [a-e],
# which is equivalent to [abcde].
# class [abcde] will match with string with 'a', 'b', 'c', 'd', 'e'.
 p = re.compile('[a-e]') 
# findall() searches for the Regular Expression and return a list upon finding
 print(p.findall("Aye, said Mr. Gibenson Stark")) 


['e', 'a', 'd', 'b', 'e', 'a']

Understanding the Output: 
First occurrence is ‘e’ in “Aye” and not ‘A’, as it being Case Sensitive. 
Next Occurrence is ‘a’ in “said”, then ‘d’ in “said”, followed by ‘b’ and ‘e’ in “Gibenson”, the Last ‘a’ matches with “Stark”.
Metacharacter backslash ‘\’ has a very important role as it signals various sequences. If the backslash is to be used without its special meaning as metacharacter, use’\\’

\d   Matches any decimal digit, this is equivalent
     to the set class [0-9].
\D   Matches any non-digit character.
\s   Matches any whitespace character.
\S   Matches any non-whitespace character
\w   Matches any alphanumeric character, this is
     equivalent to the class [a-zA-Z0-9_].
\W   Matches any non-alphanumeric character. 

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s