Python re.finditer 和 re.findall 之间的不同行为

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3765024/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 12:38:05  来源:igfitidea点击:

Different behavior between re.finditer and re.findall

pythonregex

提问by simao

I am using the following code:

我正在使用以下代码:

CARRIS_REGEX=r'<th>(\d+)</th><th>([\s\w\.\-]+)</th><th>(\d+:\d+)</th><th>(\d+m)</th>'
pattern = re.compile(CARRIS_REGEX, re.UNICODE)
matches = pattern.finditer(mailbody)
findall = pattern.findall(mailbody)

But finditer and findall are finding different things. Findall indeed finds all the matches in the given string. But finditer only finds the first one, returning an iterator with only one element.

但是 finditer 和 findall 正在寻找不同的东西。Findall 确实会找到给定字符串中的所有匹配项。但是 finditer 只找到第一个,返回一个只有一个元素的迭代器。

How can I make finditer and findall behave the same way?

如何使 finditer 和 findall 的行为相同?

Thanks

谢谢

采纳答案by Tim Pietzcker

I can't reproduce this here. Have tried it with both Python 2.7 and 3.1.

我无法在这里重现。用 Python 2.7 和 3.1 都试过了。

One difference between finditerand findallis that the former returns regex match objects whereas the other returns a tuple of the matched capturing groups (or the entire match if there are no capturing groups).

finditer和之间的一个区别findall是前者返回正则表达式匹配对象,而另一个返回匹配捕获组的元组(如果没有捕获组,则返回整个匹配)。

So

所以

import re
CARRIS_REGEX=r'<th>(\d+)</th><th>([\s\w\.\-]+)</th><th>(\d+:\d+)</th><th>(\d+m)</th>'
pattern = re.compile(CARRIS_REGEX, re.UNICODE)
mailbody = open("test.txt").read()
for match in pattern.finditer(mailbody):
    print(match)
print()
for match in pattern.findall(mailbody):
    print(match)

prints

印刷

<_sre.SRE_Match object at 0x00A63758>
<_sre.SRE_Match object at 0x00A63F98>
<_sre.SRE_Match object at 0x00A63758>
<_sre.SRE_Match object at 0x00A63F98>
<_sre.SRE_Match object at 0x00A63758>
<_sre.SRE_Match object at 0x00A63F98>
<_sre.SRE_Match object at 0x00A63758>
<_sre.SRE_Match object at 0x00A63F98>

('790', 'PR. REAL', '21:06', '04m')
('758', 'PORTAS BENFICA', '21:10', '09m')
('790', 'PR. REAL', '21:14', '13m')
('758', 'PORTAS BENFICA', '21:21', '19m')
('790', 'PR. REAL', '21:29', '28m')
('758', 'PORTAS BENFICA', '21:38', '36m')
('758', 'SETE RIOS', '21:49', '47m')
('758', 'SETE RIOS', '22:09', '68m')

If you want the same output from finditeras you're getting from findall, you need

如果您希望从finditer获得与从 获得的输出相同的输出findall,则需要

for match in pattern.finditer(mailbody):
    print(tuple(match.groups()))

回答by Tim McNamara

You can't make them behave the same way, because they're different. If you really want to create a list of results from finditer, then you could use a list comprehension:

你不能让他们表现得一样,因为他们是不同的。如果你真的想从 中创建一个结果列表finditer,那么你可以使用列表理解:

>>> [match for match in pattern.finditer(mailbody)]
[...]

In general, use a forloop to access the matches returned by re.finditer:

通常,使用for循环访问由 返回的匹配项re.finditer

>>> for match in pattern.finditer(mailbody):
...     ...

回答by Ayush

re.findall(pattern.string)

findall() returns all non-overlapping matches of pattern in string as a list of strings.

re.finditer()

finditer() returns callable object.

In both functions, the string is scanned from left to right and matches are returned in order found.

re.findall(pattern.string)

findall() 将字符串中模式的所有非重叠匹配项作为字符串 列表返回。

re.finditer()

finditer() 返回可调用对象

在这两个函数中,从左到右扫描字符串,并按找到的顺序返回匹配项。

回答by Kushan Gunasekera

I get this example from Regular expression operationsin Python 2.* Documentationand that example well described here in details with some modification. To explain whole example, let's get string type variable call,

我从Python 2.* 文档中的正则表达式操作中得到了这个示例,并且在此处详细描述了该示例并进行了一些修改。为了解释整个示例,让我们获取字符串类型变量调用,

text = "He was carefully disguised but captured quickly by police."

and the compiletype regular expression pattern as,

编译类型正则表达式模式为,

regEX = r"\w+ly"
pattern = re.compile(regEX)

\wmean matches any word character (alphanumeric & underscore), +mean matches 1 or more of the preceding tokenand the whole meaning is select any word which is end-up withly. There are only two 2 words('carefully' and 'quickly') which is satisfied the above regular expression.

\wmean匹配任何单词字符(字母数字和下划线)+mean匹配 1 个或多个前面的标记,整个含义是选择任何以ly. 满足上述正则表达式的只有两个 2 词(“小心”和“快速”)。

Before move into re.findall()or re.finditer(), let's see what does re.search()mean in Python 2.* Documentation.

在进入re.findall()re.finditer() 之前,让我们看看re.search()Python 2.* Documentation 中是什么意思。

Scan through string looking for the first location where the regular expression pattern produces a match, and return a corresponding MatchObject instance. Return None if no position in the string matches the pattern; note that this is different from finding a zero-length match at some point in the string.

扫描字符串以查找正则表达式模式产生匹配的第一个位置,并返回相应的 MatchObject 实例。如果字符串中没有位置与模式匹配,则返回 None;请注意,这与在字符串中的某个点找到零长度匹配不同。

Following code lines gives you the basic understand of re.search().

以下代码行让您对re.search()有基本的了解。

search = pattern.search(text)
print(search)
print(type(search))

#output
<re.Match object; span=(7, 16), match='carefully'>
<class 're.Match'>

It will generate re.MatchObjectof class type object which have 13 of supported methods and attributes according to Python 2.* Documentation. This span()method consist with the start and end points(7 and 16 present in the above example) of the matched word in textvariable. re.search()method only consider about the very first match, otherwise return None.

它将根据Python 2.* Documentation生成具有 13 个支持的方法和属性的类类型对象的re.MatchObject。这个span()方法由变量中匹配单词的起点和终点(在上面的例子中出现的 7 和 16)组成。re.search()方法只考虑第一个匹配,否则返回。textNone

Let's move into the question, before that see what does re.finditer()mean in Python 2.* Documentation.

让我们进入这个问题,在此之前先看看Python 2.* Documentationre.finditer()是什么意思。

Return an iterator yielding MatchObject instances over all non-overlapping matches for the RE pattern in string. The string is scanned left-to-right, and matches are returned in the order found. Empty matches are included in the result.

返回一个迭代器,在字符串中 RE 模式的所有非重叠匹配上产生 MatchObject 实例。从左到右扫描字符串,并按找到的顺序返回匹配项。结果中包含空匹配项。

Coming next code lines gives you the basic understand of re.finditer().

接下来的代码行让您对re.finditer()有基本的了解。

finditer = pattern.finditer(text)
print(finditer)
print(type(finditer))

#output
<callable_iterator object at 0x040BB690>
<class 'callable_iterator'>

The above example gives us the Iterator Objectswhich need to be loop. This is obviously not the result we want. Let's loop finditerand see what's inside this Iterator Objects.

上面的例子给了我们需要循环的迭代器对象。这显然不是我们想要的结果。让我们循环finditer看看这个Iterator Objects里面有什么。

for anObject in finditer:
    print(anObject)
    print(type(anObject))
    print()

#output
<re.Match object; span=(7, 16), match='carefully'>
<class 're.Match'>

<re.Match object; span=(40, 47), match='quickly'>
<class 're.Match'>

This results are much similar to the re.search()result which we get earlier. But we can see the new result in above output, <re.Match object; span=(40, 47), match='quickly'>. As I mention earlier in Python 2.* Documentation, re.search()will scan through string looking for the first location where the regular expression pattern produces a matchand re.finditer()will scan through string looking for all the locations where the regular expression pattern produces matchesand return more details than re.findall()method.

这个结果与我们之前得到的re.search()结果非常相似。但是我们可以在上面的输出中看到新的结果,<re.Match object; span=(40, 47), match='quickly'>。正如我之前在Python 2.* 文档中提到的,re.search()扫描字符串以查找正则表达式模式产生匹配的第一个位置,re.finditer()扫描字符串查找所有位置正则表达式模式产生匹配并返回比re.findall()方法更多的细节。

Here what does re.findall()mean in Python 2.* Documentation.

这里re.findall()Python 2.* Documentation 中是什么意思。

Return all non-overlapping matches of pattern in string, as a list of strings. The string is scanned left-to-right, and matches are returned in the order found. If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. Empty matches are included in the result.

以字符串列表的形式返回字符串中模式的所有非重叠匹配项。从左到右扫描字符串,并按找到的顺序返回匹配项。如果模式中存在一个或多个组,则返回组列表;如果模式有多个组,这将是一个元组列表。结果中包含空匹配项。

Let's understand what happen in re.findall().

让我们了解re.findall() 中发生了什么。

findall = pattern.findall(text)
print(findall)
print(type(findall))

#output
['carefully', 'quickly']
<class 'list'>

This output only gives us the matched words in textvariable, otherwise return an empty list. That listin the output which is similar to the matchattribute in re.MatchObject.

这个输出只给我们text变量中匹配的单词,否则返回一个空列表。这名单中,其输出类似于match在属性re.MatchObject

Here is the full code and I tried in Python 3.7.

这是完整的代码,我在Python 3.7 中尝试过。

import re

text = "He was carefully disguised but captured quickly by police."

regEX = r"\w+ly"
pattern = re.compile(regEX)

search = pattern.search(text)
print(search)
print(type(search))
print()

findall = pattern.findall(text)
print(findall)
print(type(findall))
print()

finditer = pattern.finditer(text)
print(finditer)
print(type(finditer))
print()
for anObject in finditer:
    print(anObject)
    print(type(anObject))
    print()