Python正则表达式在任何地方匹配多个单词
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/26985228/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python regular expression match multiple words anywhere
提问by JudyJiang
I'm trying to use python's regular expression to match a string with several words. For example, the string is "These are oranges and apples and pears, but not pinapples or .." The list of words I want to find is 'and', 'or' and 'not'. No matter the order or the position.
我正在尝试使用 python 的正则表达式来匹配包含多个单词的字符串。例如,字符串是“这些是橙子、苹果和梨,但不是菠萝或 ..”我要查找的单词列表是“and”、“or”和“not”。无论顺序还是位置。
I tried r'AND | OR | NOTbut didn't work.
我试过r'AND | OR | NOT但没有用。
Also tried r'.*?\bAND\b.*?\bOR\b.*?\bNOT\b.*?$still didn't work...
也试过了r'.*?\bAND\b.*?\bOR\b.*?\bNOT\b.*?$还是不行...
Not good at regular expression.. And hint? Thanks!
不擅长正则表达式..提示?谢谢!
采纳答案by abarnert
You've got a few problems there.
你那里有一些问题。
First, matches are case-sensitive unless you use the IGNORECASE/Iflag to ignore case. So, 'AND'doesn't match 'and'.
首先,匹配是区分大小写的,除非您使用IGNORECASE/I标志来忽略大小写。所以,'AND'不匹配'and'。
Also, unless you use the VERBOSE/Xflag, those spaces are part of the pattern. So, you're checking for 'AND ', not 'AND'. If you wanted that, you probably wanted spaces on each side, not just those sides (otherwise, 'band leader'is going to match…), and really, you probably wanted \b, not a space (otherwise a sentence starting with 'And another thing'isn't going to match).
此外,除非您使用VERBOSE/X标志,否则这些空格是模式的一部分。因此,您正在检查'AND ',而不是'AND'。如果你想要那个,你可能想要两边都有空格,而不仅仅是那些边(否则,'band leader'会匹配......),实际上,你可能想要\b,而不是空格(否则以开头的句子'And another thing'不会匹配) .
Finally, if you think you need .*before and after your pattern and $and ^around it, there's a good chance you wanted to use search, findall, or finditer, rather than match.
最后,如果你认为你需要.*前,你的模式后$,并^围绕它,还有你想使用一个很好的机会search,findall或者finditer,而不是match。
So:
所以:
>>> s = "These are oranges and apples and pears, but not pinapples or .."
>>> r = re.compile(r'\bAND\b | \bOR\b | \bNOT\b', flags=re.I | re.X)
>>> r.findall(s)
['and', 'and', 'not', 'or']


回答by Vedaad Shakib
Try this:
尝试这个:
>>> re.findall(r"\band\b|\bor\b|\bnot\b", "These are oranges and apples and pears, but not pinapples or ..")
['and', 'and', 'not', 'or']
a|b means match either a or b
a|b 表示匹配 a 或 b
\b represents a word boundary
\b 代表一个词边界
re.findall(pattern, string) returns an array of all instances of pattern in string
re.findall(pattern, string) 返回字符串中所有模式实例的数组

