Python 从字符串中删除长度小于 4 的单词

Question

提问by blackmamba

I am trying to remove words of length less than 4 from a string.

我试图从字符串中删除长度小于 4 的单词。

I use this regex:

我使用这个正则表达式：

 re.sub(' \w{1,3} ', ' ', c)

Though this removes some strings but it fails when 2-3 words of length less than 4 appear together. Like:

虽然这会删除一些字符串，但是当 2-3 个长度小于 4 的单词一起出现时它会失败。喜欢：

 I am in a bank.

It gives me:

它给了我：

 I in bank.

How to resolve this?

如何解决这个问题？

Answer 1

采纳答案by Martijn Pieters

Don't include the spaces; use \bword boundary anchors instead:

不要包含空格；使用\b词边界锚代替：

re.sub(r'\b\w{1,3}\b', '', c)

This removes words of up to 3 characters entirely:

这将完全删除最多 3 个字符的单词：

>>> import re
>>> re.sub(r'\b\w{1,3}\b', '', 'The quick brown fox jumps over the lazy dog')
' quick brown  jumps over  lazy '
>>> re.sub(r'\b\w{1,3}\b', '', 'I am in a bank.')
'    bank.'

Answer 2

回答by Vidhya G

If you want an alternative to regex:

如果您想要替代正则表达式：

new_string = ' '.join([w for w in old_string.split() if len(w)>3])

Answer 3

回答by Sizik

Answered by Martijn, but I just wanted to explain why your regex doesn't work. The regex string ' \w{1,3} 'matches a space, followed by 1-3 word characters, followed by another space. The Idoesn't get matched because it doesn't have a space in front of it. The amgets replaced, and then the regex engine starts at the next non-matched character: the iin in. It doesn't see the space before in, since it was placed there by the substitution. So, the next match it finds is a, which produces your output string.

由 Martijn 回答，但我只是想解释为什么您的正则表达式不起作用。正则表达式字符串' \w{1,3} '匹配一个空格，后跟 1-3 个单词字符，然后是另一个空格。在I没有得到匹配，因为它没有在它前面的空间。将am被替换，然后在下一个非匹配字符的正则表达式引擎开始工作：i在in。它没有看到之前的空间in，因为它被替换放置在那里。因此，它找到的下一个匹配项是a，它会生成您的输出字符串。

Python 从字符串中删除长度小于 4 的单词

提问by blackmamba

采纳答案by Martijn Pieters

回答by Vidhya G

回答by Sizik

相关推荐

最近更新

标签

Python 从字符串中删除长度小于 4 的单词

提问by blackmamba

采纳答案by Martijn Pieters

回答by Vidhya G

回答by Sizik

相关推荐

Python Pandas read_csv 导入导致错误

Python 从 Pandas DataFrame 中删除包含空单元格的行

如何为python 2.6安装pip？

python中的openCV视频保存

相关推荐

最近更新

标签