Combination of special characters in regex

Combination of special characters in regex

Special characters substring variation Example
.

dot. matches any character

*

star / asterisk. the resulting RE to match 0 or more repetitions of the preceding RE, as many repetitions as are possible.

re.search (“ab*”, substring)

substring = “nerpb”

None
substring = “nabrp” <_sre.SRE_Match object at 0x___>

retval.group(0) → a

substring = “na brp” <_sre.SRE_Match object at 0x___>

retval.group(0) → a

substring = “nabrp” <_sre.SRE_Match object at 0x___>

retval.group(0) → ab

. substring = “nabbbrp” <_sre.SRE_Match object at 0x___>

retval.group(0) → abbb

. substring = “nababrp” <_sre.SRE_Match object at 0x___>

retval.group(0) → ab

retval.group(1) → outside index

Greedy qualifiers:
*     more than 0

+     more than 1

?     0 or 1

Parsing html or xml

Content:
a. search if any substring matches <specificWord
b. search if any substring matches </specificWord
c. Get all the words / terms between whitespace (” “) & a symbol (in this case, equal, “=“).

==========

a. Purpose:
search if any substring matches <specificWord

● Python:
term=”Machine”
reg = r”\<” + term
retval1 = re.search(reg, substring)
retval2 = retval1.group(0)
substring:
<Machine>
Return value1:
<_sre.SRE_Match object at 0x___>
Regex:
r”\<Machine”
Return value2:
<Machine
● same with two cells above  substring:  </Machine> Return value1:
None

b. Purpose:
search if any substring matches </specificWord

● Python:
term=”Machine”
reg = r”\<\/” + term
retval = re.search(reg, substring)
#retval2 = retval1.group(0)
substring:
<Machine>
Return value1:
None
Regex:
r”\<\/Machine”
● same with two cells above substring:  </Machine> Return value1:
<_sre.SRE_Match object at 0x___>
Return value2:
</Machine

c. Purpose:
Get all the words / terms between whitespace (” “) & a symbol (in this case, equal, “=“).

● Python:
beg = “\s”
en = “=”
reg = beg + r”(.*?)” + en
retval1 = re.findall(reg, sLine, re.S)
retval2 = len(retval1)
retval3 = retval1[2]
substring:
<MC ACS=“1” Adv=“0” BrandID=“4” CommonVar=“100” ControllerID=“4”>
Return value1:
[‘   <MC ACS’, ‘Adv’, ‘BrandID’, ‘CommonVar’, ‘ControllerID’]
Regex:
r”\s(.*?)=”
Return value2:
5
Return value3:
BrandID