你如何在 Python 的列表理解中使用正则表达式?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/14819164/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How do you use a regex in a list comprehension in Python?
提问by Adam
I'm trying to locate all index positions of a string in a list of words and I want the values returned as a list. I would like to find the string if it is on its own, or if it is preceded or followed by punctuation, but not if it is a substring of a larger word.
我正在尝试在单词列表中定位字符串的所有索引位置,并且我希望将值作为列表返回。我想找到字符串,如果它是单独的,或者它前面或后面是标点符号,但如果它是一个较大单词的子字符串,则不是。
The following code only captures "cow" only and misses both "test;cow" and "cow."
以下代码仅捕获“cow”,而忽略“test;cow”和“cow”。
myList = ['test;cow', 'one', 'two', 'three', 'cow.', 'cow', 'acow']
myString = 'cow'
indices = [i for i, x in enumerate(myList) if x == myString]
print indices
>> 5
I have tried changing the code to use a regular expression:
我尝试更改代码以使用正则表达式:
import re
myList = ['test;cow', 'one', 'two', 'three', 'cow.', 'cow', 'acow']
myString = 'cow'
indices = [i for i, x in enumerate(myList) if x == re.match('\W*myString\W*', myList)]
print indices
But this gives an error: expected string or buffer
但这给出了一个错误:预期的字符串或缓冲区
If anyone knows what I'm doing wrong I'd be very happy to hear. I have a feeling it's something to do with the fact I'm trying to use a regular expression in there when it's expecting a string. Is there a solution?
如果有人知道我做错了什么,我会很高兴听到。我有一种感觉,这与我试图在那里使用正则表达式的事实有关,因为它需要一个字符串。有解决办法吗?
The output I'm looking for should read:
我正在寻找的输出应为:
>> [0, 4, 5]
Thanks
谢谢
采纳答案by Rohit Jain
You don't need to assign the result of matchback to x. And your match should be on xrather than list.
您不需要将matchback的结果分配给x。并且您的匹配应该是 onx而不是list。
Also, you need to use re.searchinstead of re.match, since your the regex pattern '\W*myString\W*'will not match the first element. That's because test;is not matched by \W*. Actually, you only need to test for immediate following and preceding character, and not the complete string.
此外,您需要使用re.search而不是re.match,因为您的正则表达式模式'\W*myString\W*'将与第一个元素不匹配。那是因为test;不匹配\W*. 实际上,您只需要测试紧跟和前一个字符,而不是完整的字符串。
So, you can rather use word boundariesaround the string:
因此,您可以word boundaries在字符串周围使用:
pattern = r'\b' + re.escape(myString) + r'\b'
indices = [i for i, x in enumerate(myList) if re.search(pattern, x)]
回答by georg
There are a few problems with your code. First, you need to match the expr against the list element (x), not against the whole list (myList). Second, in order to insert a variable in the expression, you have to use +(string concatenation). And finally, use raw literals (r'\W) to properly interpet slashes in the expr:
您的代码存在一些问题。首先,您需要将 expr 与列表元素 ( x)进行匹配,而不是与整个列表 ( myList)进行匹配。其次,为了在表达式中插入一个变量,你必须使用+(string concatenation)。最后,使用原始文字 ( r'\W) 在 expr 中正确插入斜杠:
import re
myList = ['test;cow', 'one', 'two', 'three', 'cow.', 'cow', 'acow']
myString = 'cow'
indices = [i for i, x in enumerate(myList) if re.match(r'\W*' + myString + r'\W*', x)]
print indices
If there are chances that myString contains special regexp characters (like a slash or a dot), you'll also need to apply re.escapeto it:
如果 myString 有可能包含特殊的正则表达式字符(如斜杠或点),您还需要对其进行应用re.escape:
regex = r'\W*' + re.escape(myString) + r'\W*'
indices = [i for i, x in enumerate(myList) if re.match(regex, x)]
As pointed out in the comments, the following might be a better option:
正如评论中指出的那样,以下可能是更好的选择:
regex = r'\b' + re.escape(myString) + r'\b'
indices = [i for i, x in enumerate(myList) if re.search(regex, x)]

