python的re模块使用方法详解

2023-08-18 14:18:05 198

一、正则表达式的特殊字符介绍

正则表达式
^匹配行首
$匹配行尾
.任意单个字符
[]匹配包含在中括号中的任意字符
[^]匹配包含在中括号中的字符之外的字符
[-]匹配指定范围的任意单个字符
？匹配之前项的1次或者0次
+匹配之前项的1次或者多次
*匹配之前项的0次或者多次
{n}匹配之前项的n次
{m,n}匹配之前项最大n次，最小m次
{n,}配置之前项至少n次

二、re模块的方法介绍

1、匹配类方法

a、findall方法

#findall方法，该方法在字符串中查找模式匹配，将所有的匹配字符串以列表的形式返回，如果文本中没有任何字符串匹配模式，则返回一个空的列表，
#如果有一个子字符串匹配模式，则返回包含一个元素的列表，所以，无论怎么匹配，我们都可以直接遍历findall返回的结果而不会出错，这对工程师
#编写程序来说，减少了异常情况的处理，代码逻辑更加简洁

#re.findall()用来输出所有符合模式匹配的子串

re_str="hellothisispython2.7.13andpython3.4.5"

pattern="python[0-9]\.[0-9]\.[0-9]"
res=re.findall(pattern=pattern,string=re_str)
print(res)

#['python2.7.1','python3.4.5']

pattern="python[0-9]\.[0-9]\.[0-9]{2,}"
res=re.findall(pattern=pattern,string=re_str)
print(res)

#['python2.7.13']


pattern="python[0-9]\.[0-9]\.[0-9]{2,}"
res=re.findall(pattern=pattern,string=re_str)
print(res)

#[]

#re.findall()方法，返回一个列表，如果匹配到的话，列表中的元素为匹配到的子字符串，如果没有匹配到，则返回一个空的列表

re_str="hellothisispython2.7.13andPython3.4.5"

pattern="python[0-9]\.[0-9]\.[0-9]"
res=re.findall(pattern=pattern,string=re_str,flags=re.IGNORECASE)
print(res)

#['python2.7.1','Python3.4.5']

#设置标志flags=re.IGNORECASE，意思为忽略大小写

b、编译的方式使用正则表达式

#我们一般采用编译的方式使用python的正则模块，如果在大量的数据量中，编译的方式使用正则性能会提高很多，具体读者们可以可以实际测试
re_str="hellothisispython2.7.13andPython3.4.5"
re_obj=re.compile(pattern="python[0-9]\.[0-9]\.[0-9]",flags=re.IGNORECASE)
res=re_obj.findall(re_str)
print(res)

c、match方法

#match方法，类似于字符串中的startwith方法，只是match应用在正则表达式中更加强大，更富有表现力，match函数用以匹配字符串的开始部分，如果模式
#匹配成功，返回一个SRE_Match类型的对象，如果模式匹配失败，则返回一个None，因此对于普通的前缀匹配，他的用法几乎和startwith一模一样，例如我
#们要判断data字符串是否以what和是否以数字开头

s_true="whatisaboy"
s_false="Whatisaboy"
re_obj=re.compile("what")

print(re_obj.match(string=s_true))
#<_sre.SRE_Matchobject;span=(0,4),match='what'

print(re_obj.match(string=s_false))
#None

s_true="123whatisaboy"
s_false="whatisaboy"

re_obj=re.compile("\d+")

print(re_obj.match(s_true))
#<_sre.SRE_Matchobject;span=(0,3),match='123'>

print(re_obj.match(s_true).start())
#0
print(re_obj.match(s_true).end())
#3
print(re_obj.match(s_true).string)
#123whatisaboy
print(re_obj.match(s_true).group())
#123


print(re_obj.match(s_false))
#None

d、search方法

#search方法，模式匹配成功后，也会返回一个SRE_Match对象，search方法和match的方法区别在于match只能从头开始匹配，而search可以从
#字符串的任意位置开始匹配，他们的共同点是，如果匹配成功，返回一个SRE_Match对象，如果匹配失败，返回一个None，这里还要注意，
#search仅仅查找第一次匹配，也就是说一个字符串中包含多个模式的匹配，也只会返回第一个匹配的结果，如果要返回所有的结果，最简单
#的方法就是findall方法，也可以使用finditer方法

e、finditer方法

#finditer返回一个迭代器，遍历迭代器可以得到一个SRE_Match对象，比如下面的例子

re_str="whatisadifferentbetweenpython2.7.14andpython3.5.4"

re_obj=re.compile("\d{1,}\.\d{1,}\.\d{1,}")

foriinre_obj.finditer(re_str):
print(i)

#<_sre.SRE_Matchobject;span=(35,41),match='2.7.14'>
#<_sre.SRE_Matchobject;span=(53,58),match='3.5.4'>

2、修改类方法介绍

a、sub方法

#re模块sub方法类似于字符串中的replace方法，只是sub方法支持使用正则表达式，所以，re模块的sub方法使用场景更加广泛

re_str="whatisadifferentbetweenpython2.7.14andpython3.5.4"

re_obj=re.compile("\d{1,}\.\d{1,}\.\d{1,}")

print(re_obj.sub("a.b.c",re_str,count=1))
#whatisadifferentbetweenpythona.b.candpython3.5.4

print(re_obj.sub("a.b.c",re_str,count=2))
#whatisadifferentbetweenpythona.b.candpythona.b.c

print(re_obj.sub("a.b.c",re_str))
#whatisadifferentbetweenpythona.b.candpythona.b.c

b、split方法

#re模块的split方法和python字符串中的split方法功能是一样的，都是将一个字符串拆分成子字符串的列表，区别在于re模块的split方法能够
#使用正则表达式
#比如下面的例子，使用.空格:!分割字符串，返回的是一个列表

re_str="whatisadifferentbetweenpython2.7.14andpython3.5.4USA:NewYork!Zidan.FRA"

re_obj=re.compile("[.:!]")

print(re_obj.split(re_str))
#['what','is','a','different','between','python','2','7','14','and','python','3','5','4','USA','NewYork','Zidan','FRA']

c、大小写不敏感设置

#3、大小写不敏感

#re.compile(flags=re.IGNORECASE)

d、非贪婪匹配

#4、非贪婪匹配，贪婪匹配总是匹配到最长的那个字符串，相应的，非贪婪匹配是匹配到最小的那个字符串，只需要在匹配字符串的时候加一个？即可

#下面的例子，注意两个.
s="Beautifulisbetterthanugly.Explicitisbetterthanimpliciy."


re_obj=re.compile("Beautiful.*y\.")

print(re_obj.findall(s))
#['Beautifulisbetterthanugly.Explicitisbetterthanimplicit.']

re_obj=re.compile("Beautiful.*?\.")

print(re_obj.findall(s))
#['Beautifulisbetterthanugly.']

e、在正则匹配字符串中加一个小括号，会有什么的效果呢？

如果是要配置一个真正的小括号，那么就需要转义符，下面的例子大家仔细看下，注意下search方法返回的对象的group（1）这个方法是报错的

importre
s="=aa1239d&&&0a()--"

#obj=re.compile("\(\)")
#search
#rep=obj.search(s)
#print(rep)
#<_sre.SRE_Matchobject;span=(15,17),match='()'>
#print(rep.group(1))
#IndexError:nosuchgroup
#print(rep.group())
#()

#findall

rep=obj.findall(s)
print(rep)
#['()']

如果是要返回括号中匹配的字符串中，则该小括号不需要转义符，findall方法返回的是小伙好中匹配到的字符串，search.group（）方法的返回的整个模式匹配到字符串，search.group(1)这个是匹配第一个小括号中的模式匹配到的字符串，search.group(2)这个是匹配第二个小括号中的模式匹配到的字符串，以此类推

s="=aa1239d&&&0a()--"
rep=re.compile("\w+(&+)")

print(rep.findall(s))
#['&&&']
print(rep.search(s).group())
#aa1239d&&&
print(rep.search(s).group(1))
#&&&

以上就是本文的全部内容，希望对大家的学习有所帮助，也希望大家多多支持毛票票。

python的re模块使用方法详解

热门推荐

随机推荐