Python 使用交替运算符匹配多个正则表达式模式?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/14182339/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 10:41:43  来源:igfitidea点击:

Matching multiple regex patterns with the alternation operator?

pythonregexregex-alternation

提问by Julian Laval

I ran into a small problem using Python Regex.

我在使用 Python Regex 时遇到了一个小问题。

Suppose this is the input:

假设这是输入:

(zyx)bc

What I'm trying to achieve is obtain whatever is between parentheses as a single match, and any char outside as an individual match. The desired result would be along the lines of:

我想要实现的是获取括号之间的任何内容作为单个匹配项,并将外部的任何字符作为单个匹配项获取。预期的结果是:

['zyx','b','c']

The order of matches should be kept.

应保持比赛的顺序。

I've tried obtaining this with Python 3.3, but can't seem to figure out the correct Regex. So far I have:

我试过用 Python 3.3 获得它,但似乎无法找出正确的正则表达式。到目前为止,我有:

matches = findall(r'\((.*?)\)|\w', '(zyx)bc')

print(matches)yields the following:

print(matches)产生以下结果:

['zyx','','']

Any ideas what I'm doing wrong?

任何想法我做错了什么?

采纳答案by James Henstridge

From the documentation of re.findall:

从文档re.findall

If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group.

如果模式中存在一个或多个组,则返回组列表;如果模式有多个组,这将是一个元组列表。

While your regexp is matching the string three times, the (.*?)group is empty for the second two matches. If you want the output of the other half of the regexp, you can add a second group:

虽然您的正则表达式匹配字符串三次,但(.*?)对于后两次匹配,该组为空。如果你想要正则表达式的另一半的输出,你可以添加第二组:

>>> re.findall(r'\((.*?)\)|(\w)', '(zyx)bc')
[('zyx', ''), ('', 'b'), ('', 'c')]

Alternatively, you could remove all the groups to get a simple list of strings again:

或者,您可以删除所有组以再次获取简单的字符串列表:

>>> re.findall(r'\(.*?\)|\w', '(zyx)bc')
['(zyx)', 'b', 'c']

You would need to manually remove the parentheses though.

不过,您需要手动删除括号。

回答by Ned Batchelder

The docs mention treating groups specially, so don't put a group around the parenthesized pattern, and you'll get everything, but you'll need to remove the parens from the matched data yourself:

文档提到了特殊处理组,所以不要在括号中的模式周围放置一个组,你会得到一切,但你需要自己从匹配的数据中删除括号:

>>> re.findall(r'\(.+?\)|\w', '(zyx)bc')
['(zyx)', 'b', 'c']

or use more groups, then process the resulting tuples to get the strings you seek:

或使用更多组,然后处理生成的元组以获取您要查找的字符串:

>>> [''.join(t) for t in re.findall(r'\((.+?)\)|(\w)', '(zyx)bc')]
>>> ['zyx', 'b', 'c']

回答by Ashwini Chaudhary

In [108]: strs="(zyx)bc"

In [109]: re.findall(r"\(\w+\)|\w",strs)
Out[109]: ['(zyx)', 'b', 'c']

In [110]: [x.strip("()") for x in re.findall(r"\(\w+\)|\w",strs)]
Out[110]: ['zyx', 'b', 'c']

回答by Fredrick Brennan

Let's take a look at our output using re.DEBUG.

让我们看看我们的输出使用re.DEBUG.

branch 
  literal 40 
  subpattern 1 
    min_repeat 0 65535 
      any None 
  literal 41 
or
  in 
    category category_word

Ouch, there's only one subpatternin there but re.findallonly pulls out subpatterns if one exists!

哎呀,里面只有一个subpattern,但re.findall只有在存在时才拔出subpatterns!

a = re.findall(r'\((.*?)\)|(.)', '(zyx)bc',re.DEBUG); a
[('zyx', ''), ('', 'b'), ('', 'c')]
branch 
  literal 40 
  subpattern 1 
    min_repeat 0 65535 
      any None 
  literal 41 
or
  subpattern 2 
    any None

Better. :)

更好的。:)

Now we just have to make this into the format you want.

现在我们只需要把它变成你想要的格式。

[i[0] if i[0] != '' else i[1] for i in a]
['zyx', 'b', 'c']

回答by alan

Other answers have shown you how to get the result you need, but with the extra step of manually removing the parentheses. If you use lookarounds in your regex, you won't need to strip the parentheses manually:

其他答案已经向您展示了如何获得所需的结果,但需要手动删除括号的额外步骤。如果您在正则表达式中使用环视,则无需手动去除括号:

>>> import re
>>> s = '(zyx)bc'
>>> print (re.findall(r'(?<=\()\w+(?=\))|\w', s))
['zyx', 'b', 'c']

Explained:

解释:

(?<=\() // lookbehind for left parenthesis
\w+     // all characters until:
(?=\))  // lookahead for right parenthesis
|       // OR
\w      // any character