Python正则表达式在任何地方匹配多个单词

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/26985228/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 01:15:32  来源:igfitidea点击:

Python regular expression match multiple words anywhere

pythonregex

提问by JudyJiang

I'm trying to use python's regular expression to match a string with several words. For example, the string is "These are oranges and apples and pears, but not pinapples or .." The list of words I want to find is 'and', 'or' and 'not'. No matter the order or the position.

我正在尝试使用 python 的正则表达式来匹配包含多个单词的字符串。例如,字符串是“这些是橙子、苹果和梨,但不是菠​​萝或 ..”我要查找的单词列表是“and”、“or”和“not”。无论顺序还是位置。

I tried r'AND | OR | NOTbut didn't work.

我试过r'AND | OR | NOT但没有用。

Also tried r'.*?\bAND\b.*?\bOR\b.*?\bNOT\b.*?$still didn't work...

也试过了r'.*?\bAND\b.*?\bOR\b.*?\bNOT\b.*?$还是不行...

Not good at regular expression.. And hint? Thanks!

不擅长正则表达式..提示?谢谢!

采纳答案by abarnert

You've got a few problems there.

你那里有一些问题。

First, matches are case-sensitive unless you use the IGNORECASE/Iflag to ignore case. So, 'AND'doesn't match 'and'.

首先,匹配是区分大小写的,除非您使用IGNORECASE/I标志来忽略大小写。所以,'AND'不匹配'and'

Also, unless you use the VERBOSE/Xflag, those spaces are part of the pattern. So, you're checking for 'AND ', not 'AND'. If you wanted that, you probably wanted spaces on each side, not just those sides (otherwise, 'band leader'is going to match…), and really, you probably wanted \b, not a space (otherwise a sentence starting with 'And another thing'isn't going to match).

此外,除非您使用VERBOSE/X标志,否则这些空格是模式的一部分。因此,您正在检查'AND ',而不是'AND'。如果你想要那个,你可能想要两边都有空格,而不仅仅是那些边(否则,'band leader'会匹配......),实际上,你可能想要\b,而不是空格(否则以开头的句子'And another thing'不会匹配) .

Finally, if you think you need .*before and after your pattern and $and ^around it, there's a good chance you wanted to use search, findall, or finditer, rather than match.

最后,如果你认为你需要.*前,你的模式后$,并^围绕它,还有你想使用一个很好的机会searchfindall或者finditer,而不是match

So:

所以:

>>> s = "These are oranges and apples and pears, but not pinapples or .."
>>> r = re.compile(r'\bAND\b | \bOR\b | \bNOT\b', flags=re.I | re.X)
>>> r.findall(s)
['and', 'and', 'not', 'or']

Regular expression visualization

Regular expression visualization

Debuggex Demo

调试器演示

回答by Vedaad Shakib

Try this:

尝试这个:

>>> re.findall(r"\band\b|\bor\b|\bnot\b", "These are oranges and apples and pears, but not pinapples or ..")
['and', 'and', 'not', 'or']

a|b means match either a or b

a|b 表示匹配 a 或 b

\b represents a word boundary

\b 代表一个词边界

re.findall(pattern, string) returns an array of all instances of pattern in string

re.findall(pattern, string) 返回字符串中所有模式实例的数组