Skip to main content

Python Regular expressions

Regular expressions is a standard way of searching, removing, modifying of text with complex pattern or string. This approach are used to formatting of a text are very easy and accurate manner(standardize).

Suppose given a text and our goal is to remove the all number which are exist in this text. So of programmer point of view this is simple problem. we can solve this problem by on logically. let as view the solution of this problem.

strings="Th2019is 10  0007 i782s p23yt4h5o22n S45tr66ing"
for text in strings:
	#ord are convert character to integers
	#check the condition character is number
	if(not(ord(text)>=ord('0') and ord(text)<=ord('9'))):
		#add a characte 

print("Before : "+strings)
print("After  : "+newText)   
Before : Th2019is 10  0007 i782s p23yt4h5o22n S45tr66ing
After  : This    is python String

Note that in given this problem can be solved by other methods (logic). this program are simply convert single character to integer value (ASCII values ). If character is not a numbers string then add this result to newText variable. here ord() can used to convert text to ASCII value.

That is a very interesting scenario for programmer and non program to solve this problem using regular expression (RE). let try to solve this problem using regular expression

#import regular expression modules 
import re

#Assign a string value
strings="Th2019is 10  0007 i782s p23yt4h5o22n S45tr66ing"

#Case 1: single line statement
print re.compile(r"[0-9]").sub("",strings) #display result

#Case 2 : multiline statement
pattern = re.compile(r"[0-9]") 
print strings #display result
This    is python String
This    is python String

When we are using of regular expression methods, so first we are need to include that module "import re". re is regular expression module that are contain several function or methods. let view all function of this module.

from re import *
#display all methods 
#all available in this program
['DOTALL', 'I', 'IGNORECASE', 'L', 'LOCALE', 'M', 'MULTILINE', 'S', 'U', 'UNICODE', 'VERBOSE', 'X', '__builtins__', '__doc__', '__file__', '__name__', '__package__', 'compile', 'error', 'escape', 'findall', 'finditer', 'match', 'purge', 'search', 'split', 'sub', 'subn', 'template']

look at this, here available lot of methods. in above program are using two methods of this module. compile() and sub().

#r  :  raw string notation
#pattern : combination of characters

Here re is module and compile() is method that are accept one parameter in this case. r is used to specialize the raw string notation. this function are return a corresponding objects. help of this object execute sub() method this is accept two parameter. first one is replace a text pattern, and another is string that is actual string.

pattern is most important because that are indicates which type of text are used to operates entire the string.

pattern Meaning
[1-9] Number is 0 to 9, same as[0123456789].
[a-z] all (a-z) lowercase characters
[A-Z] all uppercase characters
[a-zA-z] all lowercase and uppercase (capital letter) characters
[a-zA-z1-9] all lowercase,uppercase number

Let see the example of those pattern.

#import regular expression module re
import re

#Assign a string value
strings="PytHon iS Easy tO  REDS   123"
print "Actual string : " + strings

#Case 1: remove lowercase characters
print re.compile(r"[a-z]").sub("",strings) #display result

#Case 2: replace lowercase characters to space
print re.compile(r"[a-z]").sub(" ",strings) #display result

#case 3: remove all space to string
print re.sub(r'([ ])',"", strings)


Please share your knowledge to improve code and content standard. Also submit your doubts, and test case. We improve by your feedback. We will try to resolve your query as soon as possible.

New Comment