regex basic with python
nurfitri •Regex Basic.
What is Regex ? regex is a string, that describe patterns to find position of a string or character within a body of text. For example a regex of test will find the string test inside a body of text. Head on to pythex, to try this example.
In the regular expression input, write the word test, and in the test string, paste this text
Regex test will find the word TEST within a body of text. testing with some fingering texts
on the match result, we can see all occurrence of the word test get highlighted. It is just like the find function when we press <Ctrl+f>.
For beginners, we might come across a weird regex that looks like this /ing\b/. This is a expression pattern, that will find the string ing and the forward slash b \b is a special character we use to asserts words boundaries, which means, the string pattern we are finding (ing) will not be followed by another word or character.
If we post the ing\b regex in pythex using our previously used text, we will notice that the string ing that ends with space (to indicate the end of a word) get highlighted, Noticed how the ing in finger is not highlighted because ing in finger is followed by other character (er).
Expression pattern, are constructed of characters or patters that that consist of a special characters. These special characters can be class into category that define their behavior.
The category are:
- Assertions
- Character classes
- Groups and ranges
- Quantifier
- Unicode property escapes.
You can read more about it on mozilla documentation.
Using Regex in python.
In python, regex are used with the re python module. To use regex in python, we simply import the re module, construct our regex pattern using re.compile() method by passing our pattern as raw string and then use its methods.
import re ourText = "Regex test will find the word TEST within a body of text. testing with some fingering texts" pattern = re.compile(r'ing\b') # we create regex object by passing our pattern. resultsall = pattern.findall(ourText) # findall will find all match and return it as an array. # .search() will return a re.Match object that we can future use its methods. results = pattern.search(ourText) # search find one occurance of the pattern, help(results) # open help for re.Match object print('search_match:',results.group()) # print the match word print('findall_match:{}'.format(resultsall))
Using python help() function on the re.Match object, we find out various methods we can use, and one of it is the group method.
Extracting Phone number
In this example, we create a function to extractmalaysia phone number from a text. the phone number is in this format: +60124744567, start with a + sign and malaysia country code 60, and followed by two digits carrier specific number eg: 13 as 013 for celcom, 14 as 014 for hotlink and then followed by any 7 digit numbers.
Our phone regex will look something like this \+601[02-46-9]-*[0-9]{7}.
import re def extract_phone(text): my_phone_regex = re.compile(r'\+601[02-46-9]-*[0-9]{7}') return my_phone_regex.findall(text) myText= "This text contains many phone numbers +60054744567 +60104744567 +60154744567 +60104744567 in it and also+60124744567 and this+60124744567ass" print(extract_phone(myText))
Results of runnning above script
jun@b:~$ python3 restest.py ['+60104744567', '+60104744567', '+60124744567', '+60124744567']