Python中正则表达式中的groups()方法

Question

提问by Omid

I am learning about regex in Python and I have problems understanding the function groups().

我正在学习 Python 中的正则表达式，但在理解该函数时遇到问题groups()。

>>> m = re.match("([abc])+", "abc")

Here I have defined the class [abc], which as I know, means any of the characters a to c. It's defined inside a group and the + sign means we want at least one of such groups. So I execute the following line and the result is understandable:

在这里，我定义了类 [abc]，据我所知，它表示 a 到 c 中的任何字符。它在一个组内定义，+ 号表示我们至少需要一个这样的组。所以我执行以下行，结果是可以理解的：

>>> m.group()
'abc'
>>> m.group(0)
'abc'

I get why this happens. The index of the main group is 0 and 'abc' matches the class we have defined. So far so good, but I don't get why the following lines get executed the way they do:

我明白为什么会发生这种情况。主组的索引为 0，'abc' 匹配我们定义的类。到目前为止一切顺利，但我不明白为什么以下几行会以它们的方式执行：

>>> m.group(1)
'c'
>>> m.groups()
('c',)

What is group(1), I have only defined one group here and why the groups function has only the character 'c' in it? Isn't it supposed to return a tuple containing all the groups? I'd suppose it would at least contain 'abc'.

什么是group(1)，我这里只定义了一个group，为什么groups函数中只有字符“c”？是不是应该返回一个包含所有组的元组？我想它至少会包含'abc'。

Answer 1

采纳答案by alko

For redetails consult docs. In your case:

有关re详细信息，请参阅文档。在你的情况下：

group(0)stands for all matched string, hence abc, that is 3 groups a, band c

group(0)代表所有匹配的字符串，因此abc，即 3 个组a，b和c

group(i)stands for i'th group, and citing documentation

group(i)代表第 i 个组，并引用文档

If a group matches multiple times, only the last match is accessible

如果一个组匹配多次，则只能访问最后一次匹配

hence group(1)stands for last match, c

因此group(1)代表最后一场比赛，c

Your +is interpreted as group repetation, if you want repeat [abc]inside group, move +into parentheses:

您+被解释为组重复，如果您想[abc]在组内重复，请+移入括号：

>>> re.match("([abc])", "abc").groups()
('a',)
>>> re.match("([abc]+)", "abc").groups()
('abc',)

Answer 2

回答by Peter DeGlopper

From the docs:

从文档：

If a group matches multiple times, only the last match is accessible:

如果一个组匹配多次，则只能访问最后一个匹配：

>>> m = re.match(r"(..)+", "a1b2c3")  # Matches 3 times.
>>> m.group(1)                        # Returns only the last match.
'c3'

Your group can only ever match one character, so cis the last match.

您的组只能匹配一个字符，c最后一个匹配也是如此。

You mention that you'd expect to at least see 'abc'- if you want your group to match multiple characters, put the +inside the group:

您提到您至少'abc'希望看到- 如果您希望您的组匹配多个字符，请将其+放入组内：

>>> m = re.match("([abc]+)", "abc")

Answer 3

回答by Fabio Palm

This is the most specified regexp, by groups you can see the protocol, filename I forgot the file-ext.

这是最指定的正则表达式，按组可以看到协议，文件名我忘记了文件扩展名。

["](?P<protocol>http(?P<secure>s)?://)(?P<fqdn>[a-zA-Z0-9]*(?P<subdomain>(.)[a-zA-Z0-9]*)*)[/](?P<filename>([a-zA-Z.])*)["]

I the response removed because I was.

我的回复删除了，因为我是。

Python中正则表达式中的groups()方法

提问by Omid

采纳答案by alko

回答by Peter DeGlopper

回答by Fabio Palm

相关推荐

最近更新

标签

Python中正则表达式中的groups()方法

提问by Omid

采纳答案by alko

回答by Peter DeGlopper

回答by Fabio Palm

相关推荐

OpenCV Python 视频播放 - 如何为 cv2.waitKey() 设置正确的延迟

默认情况下如何在 IPython Notebook 代码单元格中显示行号

如何在 Python 2.6 中获得 argparse？

Python Pylint 无效常量名

相关推荐

最近更新

标签