Python 重复多个字符正则表达式
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/3630982/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
repeating multiple characters regex
提问by Falmarri
Is there a way using a regex to match a repeating set of characters? For example:
有没有办法使用正则表达式来匹配一组重复的字符?例如:
ABCABCABCABCABC
ABCABCABCABCABC
ABC{5}
ABC{5}
I know that's wrong. But is there anything to match that effect?
我知道那是错的。但是有什么可以匹配这种效果的吗?
Update:
更新:
Can you use nested capture groups? So Something like (?<cap>(ABC){5})?
您可以使用嵌套的捕获组吗?所以像(?<cap>(ABC){5})?
采纳答案by Brian Campbell
Enclose the regex you want to repeat in parentheses. For instance, if you want 5 repetitions of ABC:
将要重复的正则表达式括在括号中。例如,如果您想要 5 次重复ABC:
(ABC){5}
Or if you want any number of repetitions (0 or more):
或者,如果您想要任意数量的重复(0 次或更多):
(ABC)*
Or one or more repetitions:
或重复一次或多次:
(ABC)+
editto respond to update
编辑以响应更新
Parentheses in regular expressions do two things; they group together a sequence of items in a regular expression, so that you can apply an operator to an entire sequence instead of just the last item, and they capture the contents of that group so you can extract the substring that was matched by that subexpression in the regex.
正则表达式中的括号有两件事:它们将正则表达式中的一系列项目组合在一起,以便您可以将运算符应用于整个序列,而不仅仅是最后一个项目,并且它们捕获该组的内容,以便您可以提取与该子表达式匹配的子字符串在正则表达式中。
You can nest parentheses; they are counted from the first opening paren. For instance:
你可以嵌套括号;它们是从第一个开头括号开始计算的。例如:
>>> re.search('[0-9]* (ABC(...))', '123 ABCDEF 456').group(0)
'123 ABCDEF'
>>> re.search('[0-9]* (ABC(...))', '123 ABCDEF 456').group(1)
'ABCDEF'
>>> re.search('[0-9]* (ABC(...))', '123 ABCDEF 456').group(2)
'DEF'
If you would like to avoid capturing when you are grouping, you can use (?:. This can be helpful if you don't want parentheses that you're just using to group together a sequence for the purpose of applying an operator to change the numbering of your matches. It is also faster.
如果您想在分组时避免捕获,可以使用(?:. 如果您不想使用括号将序列组合在一起,以便应用运算符来更改匹配项的编号,这会很有帮助。它也更快。
>>> re.search('[0-9]* (?:ABC(...))', '123 ABCDEF 456').group(1)
'DEF'
So to answer your update, yes, you can use nested capture groups, or even avoid capturing with the inner group at all:
因此,要回答您的更新,是的,您可以使用嵌套捕获组,甚至完全避免使用内部组进行捕获:
>>> re.search('((?:ABC){5})(DEF)', 'ABCABCABCABCABCDEF').group(1)
'ABCABCABCABCABC'
>>> re.search('((?:ABC){5})(DEF)', 'ABCABCABCABCABCDEF').group(2)
'DEF'
回答by Novikov
(ABC){5}Should work for you
(ABC){5}应该为你工作
回答by pyfunc
Parentheses "()" are used to group characters and expressions within larger, more complex regular expressions. Quantifiers that immediately follow the group apply to the whole group.
括号“()”用于将字符和表达式分组到更大、更复杂的正则表达式中。紧跟在组后面的量词适用于整个组。
(ABC){5}
回答by Zafer
ABC{5} matches ABCCCCC. To match 5 ABC's, you should use (ABC){5}. Parentheses are used to group a set of characters. You can also set an interval for occurrences like (ABC){3,5} which matches ABCABCABC, ABCABCABCABC, and ABCABCABCABCABC.
ABC{5} 匹配 ABCCCCC。要匹配 5 个 ABC,您应该使用 (ABC){5}。括号用于对一组字符进行分组。您还可以为与 ABCABCABC、ABCABCABCABC 和 ABCABCABCABCABC 匹配的 (ABC){3,5} 等事件设置间隔。
(ABC){1,} means 1 or more repetition which is exactly the same as (ABC)+.
(ABC){1,} 表示 1 次或多次重复,与 (ABC)+ 完全相同。
(ABC){0,} means 0 or more repetition which is exactly the same as (ABC)*.
(ABC){0,} 表示 0 次或多次重复,与 (ABC)* 完全相同。
回答by dash-tom-bang
As to the update to the question-
至于问题的更新-
You can nest capture groups. The capture group index is incremented per open paren.
您可以嵌套捕获组。捕获组索引按打开的括号递增。
(((ABC)*)(DEF)*)
Feeding that regex ABCABCABCDEFDEFDEF, capture group 0 matches the whole thing, 1 is also the whole thing, 2 is ABCABCABC, 3 is ABC, and 4 is DEF (because the star is outside of the capture group).
输入正则表达式 ABCABCABCDEFDEDFEF,捕获组 0 匹配整个事物,1 也是整个事物,2 是 ABCABCABC,3 是 ABC,4 是 DEF(因为星星在捕获组之外)。
If you have variation inside a capture group and a repeat just outside, then things can get a little wonky if you're not expecting it...
如果您在捕获组内部有变化而在外部有重复,那么如果您没有预料到,事情可能会变得有点不稳定......
(a[bc]*c)*
when fed abbbcccabbc will return the lastmatch as capture group 1, in this example just the abbc, since the capture group gets reset with the repeat operator.
当输入 abbbccccabbc 时,将返回最后一个匹配作为捕获组 1,在本例中只是 abbc,因为捕获组使用重复运算符重置。

