用于查找字符串中所有单词的 Python 正则表达式
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/37543724/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python regex for finding all words in a string
提问by TNT
Hello I am new into regex and I'm starting out with python. I'm stuck at extracting all words from an English sentence. So far I have:
您好,我是 regex 的新手,我从 python 开始。我被困在从一个英语句子中提取所有单词。到目前为止,我有:
import re
shop="hello seattle what have you got"
regex = r'(\w*) '
list1=re.findall(regex,shop)
print list1
This gives output:
这给出了输出:
['hello', 'seattle', 'what', 'have', 'you']
['你好','西雅图','什么','有','你']
If I replace regex by
如果我将正则表达式替换为
regex = r'(\w*)\W*'
then output:
然后输出:
['hello', 'seattle', 'what', 'have', 'you', 'got', '']
['你好','西雅图','什么','有','你','得到','']
whereas I want this output
而我想要这个输出
['hello', 'seattle', 'what', 'have', 'you', 'got']
['你好','西雅图','什么','有','你','有']
Please point me where I am going wrong.
请指出我哪里出错了。
回答by Pranav C Balan
Use word boundary \b
使用词边界 \b
import re
shop="hello seattle what have you got"
regex = r'\b\w+\b'
list1=re.findall(regex,shop)
print list1
OP : ['hello', 'seattle', 'what', 'have', 'you', 'got']
or simply \w+
is enough
或者干脆\w+
就够了
import re
shop="hello seattle what have you got"
regex = r'\w+'
list1=re.findall(regex,shop)
print list1
OP : ['hello', 'seattle', 'what', 'have', 'you', 'got']