Python中正则表达式中的groups()方法

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/20202365/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 19:52:27  来源:igfitidea点击:

The groups() method in regular expressions in Python

pythonregex

提问by Omid

I am learning about regex in Python and I have problems understanding the function groups().

我正在学习 Python 中的正则表达式,但在理解该函数时遇到问题groups()

>>> m = re.match("([abc])+", "abc")

Here I have defined the class [abc], which as I know, means any of the characters a to c. It's defined inside a group and the + sign means we want at least one of such groups. So I execute the following line and the result is understandable:

在这里,我定义了类 [abc],据我所知,它表示 a 到 c 中的任何字符。它在一个组内定义,+ 号表示我们至少需要一个这样的组。所以我执行以下行,结果是可以理解的:

>>> m.group()
'abc'
>>> m.group(0)
'abc'

I get why this happens. The index of the main group is 0 and 'abc' matches the class we have defined. So far so good, but I don't get why the following lines get executed the way they do:

我明白为什么会发生这种情况。主组的索引为 0,'abc' 匹配我们定义的类。到目前为止一切顺利,但我不明白为什么以下几行会以它们的方式执行:

>>> m.group(1)
'c'
>>> m.groups()
('c',)

What is group(1), I have only defined one group here and why the groups function has only the character 'c' in it? Isn't it supposed to return a tuple containing all the groups? I'd suppose it would at least contain 'abc'.

什么是group(1),我这里只定义了一个group,为什么groups函数中只有字符“c”?是不是应该返回一个包含所有组的元组?我想它至少会包含'abc'。

采纳答案by alko

For redetails consult docs. In your case:

有关re详细信息,请参阅文档。在你的情况下:

group(0)stands for all matched string, hence abc, that is 3 groups a, band c

group(0)代表所有匹配的字符串,因此abc,即 3 个组abc

group(i)stands for i'th group, and citing documentation

group(i)代表第 i 个组,并引用文档

If a group matches multiple times, only the last match is accessible

如果一个组匹配多次,则只能访问最后一次匹配

hence group(1)stands for last match, c

因此group(1)代表最后一场比赛,c

Your +is interpreted as group repetation, if you want repeat [abc]inside group, move +into parentheses:

+被解释为组重复,如果您想[abc]在组内重复,请+移入括号:

>>> re.match("([abc])", "abc").groups()
('a',)
>>> re.match("([abc]+)", "abc").groups()
('abc',)

回答by Peter DeGlopper

From the docs:

文档

If a group matches multiple times, only the last match is accessible:

如果一个组匹配多次,则只能访问最后一个匹配:

>>> m = re.match(r"(..)+", "a1b2c3")  # Matches 3 times.
>>> m.group(1)                        # Returns only the last match.
'c3'

Your group can only ever match one character, so cis the last match.

您的组只能匹配一个字符,c最后一个匹配也是如此。

You mention that you'd expect to at least see 'abc'- if you want your group to match multiple characters, put the +inside the group:

您提到您至少'abc'希望看到- 如果您希望您的组匹配多个字符,请将其+放入组内:

>>> m = re.match("([abc]+)", "abc")

回答by Fabio Palm

This is the most specified regexp, by groups you can see the protocol, filename I forgot the file-ext.

这是最指定的正则表达式,按组可以看到协议,文件名我忘记了文件扩展名。

["](?P<protocol>http(?P<secure>s)?://)(?P<fqdn>[a-zA-Z0-9]*(?P<subdomain>(.)[a-zA-Z0-9]*)*)[/](?P<filename>([a-zA-Z.])*)["]

I the response removed because I was.

我的回复删除了,因为我是。

enter image description here

在此处输入图片说明