Python中正则表达式中的groups()方法
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/20202365/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
The groups() method in regular expressions in Python
提问by Omid
I am learning about regex in Python and I have problems understanding the function groups().
我正在学习 Python 中的正则表达式,但在理解该函数时遇到问题groups()。
>>> m = re.match("([abc])+", "abc")
Here I have defined the class [abc], which as I know, means any of the characters a to c. It's defined inside a group and the + sign means we want at least one of such groups. So I execute the following line and the result is understandable:
在这里,我定义了类 [abc],据我所知,它表示 a 到 c 中的任何字符。它在一个组内定义,+ 号表示我们至少需要一个这样的组。所以我执行以下行,结果是可以理解的:
>>> m.group()
'abc'
>>> m.group(0)
'abc'
I get why this happens. The index of the main group is 0 and 'abc' matches the class we have defined. So far so good, but I don't get why the following lines get executed the way they do:
我明白为什么会发生这种情况。主组的索引为 0,'abc' 匹配我们定义的类。到目前为止一切顺利,但我不明白为什么以下几行会以它们的方式执行:
>>> m.group(1)
'c'
>>> m.groups()
('c',)
What is group(1), I have only defined one group here and why the groups function has only the character 'c' in it? Isn't it supposed to return a tuple containing all the groups? I'd suppose it would at least contain 'abc'.
什么是group(1),我这里只定义了一个group,为什么groups函数中只有字符“c”?是不是应该返回一个包含所有组的元组?我想它至少会包含'abc'。
采纳答案by alko
For redetails consult docs. In your case:
有关re详细信息,请参阅文档。在你的情况下:
group(0)stands for all matched string, hence abc, that is 3 groups a, band c
group(0)代表所有匹配的字符串,因此abc,即 3 个组a,b和c
group(i)stands for i'th group, and citing documentation
group(i)代表第 i 个组,并引用文档
If a group matches multiple times, only the last match is accessible
如果一个组匹配多次,则只能访问最后一次匹配
hence group(1)stands for last match, c
因此group(1)代表最后一场比赛,c
Your +is interpreted as group repetation, if you want repeat [abc]inside group, move +into parentheses:
您+被解释为组重复,如果您想[abc]在组内重复,请+移入括号:
>>> re.match("([abc])", "abc").groups()
('a',)
>>> re.match("([abc]+)", "abc").groups()
('abc',)
回答by Peter DeGlopper
From the docs:
从文档:
If a group matches multiple times, only the last match is accessible:
如果一个组匹配多次,则只能访问最后一个匹配:
>>> m = re.match(r"(..)+", "a1b2c3") # Matches 3 times.
>>> m.group(1) # Returns only the last match.
'c3'
Your group can only ever match one character, so cis the last match.
您的组只能匹配一个字符,c最后一个匹配也是如此。
You mention that you'd expect to at least see 'abc'- if you want your group to match multiple characters, put the +inside the group:
您提到您至少'abc'希望看到- 如果您希望您的组匹配多个字符,请将其+放入组内:
>>> m = re.match("([abc]+)", "abc")
回答by Fabio Palm
This is the most specified regexp, by groups you can see the protocol, filename I forgot the file-ext.
这是最指定的正则表达式,按组可以看到协议,文件名我忘记了文件扩展名。
["](?P<protocol>http(?P<secure>s)?://)(?P<fqdn>[a-zA-Z0-9]*(?P<subdomain>(.)[a-zA-Z0-9]*)*)[/](?P<filename>([a-zA-Z.])*)["]
I the response removed because I was.
我的回复删除了,因为我是。


