Python 从字符串中删除长度小于 4 的单词
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/24332025/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Remove words of length less than 4 from string
提问by blackmamba
I am trying to remove words of length less than 4 from a string.
我试图从字符串中删除长度小于 4 的单词。
I use this regex:
我使用这个正则表达式:
re.sub(' \w{1,3} ', ' ', c)
Though this removes some strings but it fails when 2-3 words of length less than 4 appear together. Like:
虽然这会删除一些字符串,但是当 2-3 个长度小于 4 的单词一起出现时它会失败。喜欢:
I am in a bank.
It gives me:
它给了我:
I in bank.
How to resolve this?
如何解决这个问题?
采纳答案by Martijn Pieters
Don't include the spaces; use \b
word boundary anchors instead:
不要包含空格;使用\b
词边界锚代替:
re.sub(r'\b\w{1,3}\b', '', c)
This removes words of up to 3 characters entirely:
这将完全删除最多 3 个字符的单词:
>>> import re
>>> re.sub(r'\b\w{1,3}\b', '', 'The quick brown fox jumps over the lazy dog')
' quick brown jumps over lazy '
>>> re.sub(r'\b\w{1,3}\b', '', 'I am in a bank.')
' bank.'
回答by Vidhya G
If you want an alternative to regex:
如果您想要替代正则表达式:
new_string = ' '.join([w for w in old_string.split() if len(w)>3])
回答by Sizik
Answered by Martijn, but I just wanted to explain why your regex doesn't work. The regex string ' \w{1,3} '
matches a space, followed by 1-3 word characters, followed by another space. The I
doesn't get matched because it doesn't have a space in front of it. The am
gets replaced, and then the regex engine starts at the next non-matched character: the i
in in
. It doesn't see the space before in
, since it was placed there by the substitution. So, the next match it finds is a
, which produces your output string.
由 Martijn 回答,但我只是想解释为什么您的正则表达式不起作用。正则表达式字符串' \w{1,3} '
匹配一个空格,后跟 1-3 个单词字符,然后是另一个空格。在I
没有得到匹配,因为它没有在它前面的空间。将am
被替换,然后在下一个非匹配字符的正则表达式引擎开始工作:i
在in
。它没有看到之前的空间in
,因为它被替换放置在那里。因此,它找到的下一个匹配项是a
,它会生成您的输出字符串。