从python列表中删除字符串中所有出现的单词

Question

提问by Ogre

I'm trying to match and remove all words in a list from a string using a compiled regex but I'm struggling to avoid occurrences within words.

我正在尝试使用编译的正则表达式从字符串中匹配和删除列表中的所有单词，但我正在努力避免在单词中出现。

Current:

当前的：

 REMOVE_LIST = ["a", "an", "as", "at", ...]

 remove = '|'.join(REMOVE_LIST)
 regex = re.compile(r'('+remove+')', flags=re.IGNORECASE)
 out = regex.sub("", text)

In: "The quick brown fox jumped over an ant"

在：“敏捷的棕色狐狸跳过了一只蚂蚁”

Out: "quick brown fox jumped over t"

出：“快棕狐跳过了t”

Expected: "quick brown fox jumped over"

预期：“快棕狐跳过”

I've tried changing the string to compile to the following but to no avail:

我尝试更改字符串以编译为以下内容但无济于事：

 regex = re.compile(r'\b('+remove+')\b', flags=re.IGNORECASE)

Any suggestions or am I missing something garishly obvious?

有什么建议还是我错过了一些非常明显的东西？

Answer 1

采纳答案by NPE

One problem is that only the first \bis inside a raw string. The second gets interpreted as the backspace character (ASCII 8) rather than as a word boundary.

一个问题是只有第一个\b在原始字符串中。第二个被解释为退格字符 (ASCII 8) 而不是单词边界。

To fix, change

修复，改变

regex = re.compile(r'\b('+remove+')\b', flags=re.IGNORECASE)

to

到

regex = re.compile(r'\b('+remove+r')\b', flags=re.IGNORECASE)
                                 ^ THIS

Answer 2

回答by jurgenreza

here is a suggestion without using regex you may want to consider:

这是您可能需要考虑的不使用正则表达式的建议：

>>> sentence = 'word1 word2 word3 word1 word2 word4'
>>> remove_list = ['word1', 'word2']
>>> word_list = sentence.split()
>>> ' '.join([i for i in word_list if i not in remove_list])
'word3 word4'

从python列表中删除字符串中所有出现的单词

提问by Ogre

采纳答案by NPE

回答by jurgenreza

相关推荐

最近更新

标签

从python列表中删除字符串中所有出现的单词

提问by Ogre

采纳答案by NPE

回答by jurgenreza

相关推荐

Python 将单词转换为字符列表

Python 如何使用 pytest 检查错误没有被引发

Python 导入错误：无法导入名称 _imaging

pip 不会使用 --user 在本地安装 Python 包

相关推荐

最近更新

标签