Python Regular expressions
Regular expressions is a standard way of searching, removing, modifying of text with complex pattern or string. This approach are used to formatting of a text are very easy and accurate manner(standardize).
Suppose given a text and our goal is to remove the all number which are exist in this text. So of programmer point of view this is simple problem. we can solve this problem by on logically. let as view the solution of this problem.
strings="Th2019is 10 0007 i782s p23yt4h5o22n S45tr66ing" newText='' for text in strings: #ord are convert character to integers #check the condition character is number if(not(ord(text)>=ord('0') and ord(text)<=ord('9'))): #add a characte newText=newText+text print("Before : "+strings) print("After : "+newText)
Before : Th2019is 10 0007 i782s p23yt4h5o22n S45tr66ing After : This is python String
Note that in given this problem can be solved by other methods (logic). this program are simply convert single character to integer value (ASCII values ). If character is not a numbers string then add this result to newText variable. here ord() can used to convert text to ASCII value.
That is a very interesting scenario for programmer and non program to solve this problem using regular expression (RE). let try to solve this problem using regular expression
#import regular expression modules import re #Assign a string value strings="Th2019is 10 0007 i782s p23yt4h5o22n S45tr66ing" #Case 1: single line statement print re.compile(r"[0-9]").sub("",strings) #display result #Case 2 : multiline statement pattern = re.compile(r"[0-9]") strings=pattern.sub("",strings) print strings #display result
This is python String This is python String
When we are using of regular expression methods, so first we are need to include that module "import re". re is regular expression module that are contain several function or methods. let view all function of this module.
from re import * #display all methods #all available in this program print(dir())
['DOTALL', 'I', 'IGNORECASE', 'L', 'LOCALE', 'M', 'MULTILINE', 'S', 'U', 'UNICODE', 'VERBOSE', 'X', '__builtins__', '__doc__', '__file__', '__name__', '__package__', 'compile', 'error', 'escape', 'findall', 'finditer', 'match', 'purge', 'search', 'split', 'sub', 'subn', 'template']
look at this, here available lot of methods. in above program are using two methods of this module. compile() and sub().
#r : raw string notation #pattern : combination of characters re.compile(r"pattern")
Here re is module and compile() is method that are accept one parameter in this case. r is used to specialize the raw string notation. this function are return a corresponding objects. help of this object execute sub() method this is accept two parameter. first one is replace a text pattern, and another is string that is actual string.
pattern is most important because that are indicates which type of text are used to operates entire the string.
|[1-9]||Number is 0 to 9, same as.|
|[a-z]||all (a-z) lowercase characters|
|[A-Z]||all uppercase characters|
|[a-zA-z]||all lowercase and uppercase (capital letter) characters|
|[a-zA-z1-9]||all lowercase,uppercase number|
Let see the example of those pattern.
#import regular expression module re import re #Assign a string value strings="PytHon iS Easy tO REDS 123" print "Actual string : " + strings #Case 1: remove lowercase characters print re.compile(r"[a-z]").sub("",strings) #display result #Case 2: replace lowercase characters to space print re.compile(r"[a-z]").sub(" ",strings) #display result #case 3: remove all space to string print re.sub(r'([ ])',"", strings)