如何从 Python 中的一组字符串中删除特定的子字符串?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/37372603/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 19:18:50  来源:igfitidea点击:

How to remove specific substrings from a set of strings in Python?

pythonpython-3.x

提问by controlfreak

I have a set of strings set1, and all the strings in set1have a two specific substrings which I don't need and want to remove.
Sample Input: set1={'Apple.good','Orange.good','Pear.bad','Pear.good','Banana.bad','Potato.bad'}
So basically I want the .goodand .badsubstrings removed from all the strings.
What I tried:

我有一组 strings set1,并且其中的所有字符串set1都有两个特定的子字符串,我不需要并且想要删除它们。
示例输入: set1={'Apple.good','Orange.good','Pear.bad','Pear.good','Banana.bad','Potato.bad'}
所以基本上我希望从所有字符串中删除.good.bad子字符串。
我试过的:

for x in set1:
    x.replace('.good','')
    x.replace('.bad','')

But this doesn't seem to work at all. There is absolutely no change in the output and it is the same as the input. I tried using for x in list(set1)instead of the original one but that doesn't change anything.

但这似乎根本不起作用。输出绝对没有变化,它与输入相同。我尝试使用for x in list(set1)而不是原始的,但这并没有改变任何东西。

回答by Reut Sharabani

Strings are immutable. string.replace(python 2.x) or str.replace(python 3.x) creates a newstring. This is stated in the documentation:

字符串是不可变的。string.replace(python 2.x) 或str.replace(python 3.x) 创建一个字符串。这在文档中说明:

Return a copyof string s with all occurrences of substring old replaced by new. ...

返回字符串 s的副本,其中所有出现的子字符串 old 都被 new 替换。...

This means you have to re-allocate the set or re-populate it (re-allocating is easier with set comprehension):

这意味着您必须重新分配集合或重新填充它(使用集合理解重新分配更容易

new_set = {x.replace('.good', '').replace('.bad', '') for x in set1}

回答by Alex Hall

>>> x = 'Pear.good'
>>> y = x.replace('.good','')
>>> y
'Pear'
>>> x
'Pear.good'

.replacedoesn't changethe string, it returns a copy of the string with the replacement. You can't change the string directly because strings are immutable.

.replace更改字符串,它返回带有替换字符串的副本。您不能直接更改字符串,因为字符串是不可变的。

You need to take the return values from x.replaceand put them in a new set.

您需要从中获取返回值x.replace并将它们放入新集合中。

回答by gueeest

All you need is a bit of black magic!

你所需要的只是一点黑魔法!

>>> a = ["cherry.bad","pear.good", "apple.good"]
>>> a = list(map(lambda x: x.replace('.good','').replace('.bad',''),a))
>>> a
['cherry', 'pear', 'apple']

回答by Vivek

You could do this:

你可以这样做:

import re
import string
set1={'Apple.good','Orange.good','Pear.bad','Pear.good','Banana.bad','Potato.bad'}

for x in set1:
    x.replace('.good',' ')
    x.replace('.bad',' ')
    x = re.sub('\.good$', '', x)
    x = re.sub('\.bad$', '', x)
    print(x)

回答by user140259

I did the test (but it is not your example) and the data does not return them orderly or complete

我做了测试(但它不是你的例子)并且数据没有有序或完整地返回它们

>>> ind = ['p5','p1','p8','p4','p2','p8']
>>> newind = {x.replace('p','') for x in ind}
>>> newind
{'1', '2', '8', '5', '4'}

I proved that this works:

我证明这是有效的:

>>> ind = ['p5','p1','p8','p4','p2','p8']
>>> newind = [x.replace('p','') for x in ind]
>>> newind
['5', '1', '8', '4', '2', '8']

or

或者

>>> newind = []
>>> ind = ['p5','p1','p8','p4','p2','p8']
>>> for x in ind:
...     newind.append(x.replace('p',''))
>>> newind
['5', '1', '8', '4', '2', '8']

回答by cs95

When there are multiple substrings to remove, one simple and effective option is to use re.subwith a compiled pattern that involves joining all the substrings-to-remove using the regex OR (|) pipe.

当要删除多个子字符串时,一个简单而有效的选择是使用re.sub编译模式,该模式涉及使用正则表达式 OR ( |) 管道连接所有要删除的子字符串。

import re

to_remove = ['.good', '.bad']
strings = ['Apple.good','Orange.good','Pear.bad']

p = re.compile('|'.join(map(re.escape, to_remove))) # escape to handle metachars
[p.sub('', s) for s in strings]
# ['Apple', 'Orange', 'Pear']

回答by rsc05

If list

如果列出

I was doing something for a list which is a set of strings and you want to remove all lines that have a certain substring you can do this

我正在为一个列表做一些事情,它是一组字符串,你想删除所有具有某个子字符串的行,你可以这样做

import re
def RemoveInList(sub,LinSplitUnOr):
    indices = [i for i, x in enumerate(LinSplitUnOr) if re.search(sub, x)]
    A = [i for j, i in enumerate(LinSplitUnOr) if j not in indices]
    return A

where subis a patter that you do not wish to have in a list of lines LinSplitUnOr

sub您不希望在行列表中出现的模式在哪里LinSplitUnOr

for example

例如

A=['Apple.good','Orange.good','Pear.bad','Pear.good','Banana.bad','Potato.bad']
sub = 'good'
A=RemoveInList(sub,A)

Then Awill be

然后A将是

enter image description here

在此处输入图片说明