Python研究

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/20240239/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 19:57:54  来源:igfitidea点击:

Python re.search

pythonregex

提问by Krishna M

I have a string variable containing

我有一个包含的字符串变量

string = "123hello456world789"

string contain no spacess. I want to write a regex such that prints only words containing(a-z) I tried a simple regex

字符串不包含空格。我想写一个正则表达式,只打印包含(az)的单词我尝试了一个简单的正则表达式

pat = "([a-z]+){1,}"
match = re.search(r""+pat,word,re.DEBUG)

match object contains only the word Helloand the word Worldis not matched.

匹配对象只包含单词HelloWorld不匹配的单词。

When is used re.findall()I could get both Helloand World.

何时使用re.findall()我可以同时获得HelloWorld

My question is why we can't do this with re.search()?

我的问题是为什么我们不能用re.search()?

How do this with re.search()?

这个怎么用re.search()

采纳答案by Inbar Rose

re.search()finds the pattern oncein the string, documenation:

re.search()在字符串中找到一次模式,文档

Scan through string looking for a location where the regular expression pattern produces a match, and return a corresponding MatchObject instance. Return None if no position in the string matches the pattern; note that this is different from finding a zero-length match at some point in the string.

扫描字符串以查找正则表达式模式产生匹配的位置,并返回相应的 MatchObject 实例。如果字符串中没有位置与模式匹配,则返回 None;请注意,这与在字符串中的某个点找到零长度匹配不同。

In order to match everyoccurrence, you need re.findall(), documentation:

为了匹配每次出现,您需要re.findall()文档

Return all non-overlapping matches of pattern in string, as a list of strings. The string is scanned left-to-right, and matches are returned in the order found. If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. Empty matches are included in the result unless they touch the beginning of another match.

以字符串列表的形式返回字符串中模式的所有非重叠匹配项。从左到右扫描字符串,并按找到的顺序返回匹配项。如果模式中存在一个或多个组,则返回组列表;如果模式有多个组,这将是一个元组列表。空匹配项包含在结果中,除非它们触及另一个匹配项的开头。

Example:

例子:

>>> import re
>>> regex = re.compile(r'([a-z]+)', re.I)
>>> # using search we only get the first item.
>>> regex.search("123hello456world789").groups()
('hello',)
>>> # using findall we get every item.
>>> regex.findall("123hello456world789")
['hello', 'world']


UPDATE:

更新:

Due to your duplicate question(as discussed at this link) I have added my other answer here as well:

由于您的重复问题如本链接所述),我也在此处添加了我的其他答案:

>>> import re
>>> regex = re.compile(r'([a-z][a-z-\']+[a-z])')
>>> regex.findall("HELLO W-O-R-L-D") # this has uppercase
[]  # there are no results here, because the string is uppercase
>>> regex.findall("HELLO W-O-R-L-D".lower()) # lets lowercase
['hello', 'w-o-r-l-d'] # now we have results
>>> regex.findall("123hello456world789")
['hello', 'world']

As you can see, the reason why you were failing on the first sample you provided is because of the uppercase, you can simply add the re.IGNORECASEflag, though you mentioned that matches should be lowercase only.

如您所见,您提供的第一个示例失败的原因是大写,您可以简单地添加re.IGNORECASE标志,尽管您提到匹配项只能是小写。

回答by Peter Gibson

@InbarRose answer shows why re.search works that way, but if you want matchobjects rather than just the string outputs from re.findall, use re.finditer

@InbarRose 答案说明了为什么 re.search 会这样工作,但是如果您想要match对象而不仅仅是来自 的字符串输出re.findall,请使用re.finditer

>>> for match in re.finditer(pat, string):
...     print match.groups()
...
('hello',)
('world',)
>>>

Or alternatively if you wanted a list

或者,如果你想要一个 list

>>> list(re.finditer(pat, string))
[<_sre.SRE_Match object at 0x022DB320>, <_sre.SRE_Match object at 0x022DB660>]

It's also generally a bad idea to use stringas a variable name given that it's a common module.

string鉴于它是一个通用模块,将其用作变量名通常也是一个坏主意。