如何从 Python 中的一组字符串中删除特定的子字符串?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/37372603/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to remove specific substrings from a set of strings in Python?
提问by controlfreak
I have a set of strings set1
, and all the strings in set1
have a two specific substrings which I don't need and want to remove.
Sample Input:
set1={'Apple.good','Orange.good','Pear.bad','Pear.good','Banana.bad','Potato.bad'}
So basically I want the .good
and .bad
substrings removed from all the strings.
What I tried:
我有一组 strings set1
,并且其中的所有字符串set1
都有两个特定的子字符串,我不需要并且想要删除它们。
示例输入:
set1={'Apple.good','Orange.good','Pear.bad','Pear.good','Banana.bad','Potato.bad'}
所以基本上我希望从所有字符串中删除.good
和.bad
子字符串。
我试过的:
for x in set1:
x.replace('.good','')
x.replace('.bad','')
But this doesn't seem to work at all. There is absolutely no change in the output and it is the same as the input. I tried using for x in list(set1)
instead of the original one but that doesn't change anything.
但这似乎根本不起作用。输出绝对没有变化,它与输入相同。我尝试使用for x in list(set1)
而不是原始的,但这并没有改变任何东西。
回答by Reut Sharabani
Strings are immutable. string.replace
(python 2.x) or str.replace
(python 3.x) creates a newstring. This is stated in the documentation:
字符串是不可变的。string.replace
(python 2.x) 或str.replace
(python 3.x) 创建一个新字符串。这在文档中说明:
Return a copyof string s with all occurrences of substring old replaced by new. ...
返回字符串 s的副本,其中所有出现的子字符串 old 都被 new 替换。...
This means you have to re-allocate the set or re-populate it (re-allocating is easier with set comprehension):
这意味着您必须重新分配集合或重新填充它(使用集合理解重新分配更容易):
new_set = {x.replace('.good', '').replace('.bad', '') for x in set1}
回答by Alex Hall
>>> x = 'Pear.good'
>>> y = x.replace('.good','')
>>> y
'Pear'
>>> x
'Pear.good'
.replace
doesn't changethe string, it returns a copy of the string with the replacement. You can't change the string directly because strings are immutable.
.replace
不更改字符串,它返回带有替换字符串的副本。您不能直接更改字符串,因为字符串是不可变的。
You need to take the return values from x.replace
and put them in a new set.
您需要从中获取返回值x.replace
并将它们放入新集合中。
回答by gueeest
All you need is a bit of black magic!
你所需要的只是一点黑魔法!
>>> a = ["cherry.bad","pear.good", "apple.good"]
>>> a = list(map(lambda x: x.replace('.good','').replace('.bad',''),a))
>>> a
['cherry', 'pear', 'apple']
回答by Vivek
You could do this:
你可以这样做:
import re
import string
set1={'Apple.good','Orange.good','Pear.bad','Pear.good','Banana.bad','Potato.bad'}
for x in set1:
x.replace('.good',' ')
x.replace('.bad',' ')
x = re.sub('\.good$', '', x)
x = re.sub('\.bad$', '', x)
print(x)
回答by user140259
I did the test (but it is not your example) and the data does not return them orderly or complete
我做了测试(但它不是你的例子)并且数据没有有序或完整地返回它们
>>> ind = ['p5','p1','p8','p4','p2','p8']
>>> newind = {x.replace('p','') for x in ind}
>>> newind
{'1', '2', '8', '5', '4'}
I proved that this works:
我证明这是有效的:
>>> ind = ['p5','p1','p8','p4','p2','p8']
>>> newind = [x.replace('p','') for x in ind]
>>> newind
['5', '1', '8', '4', '2', '8']
or
或者
>>> newind = []
>>> ind = ['p5','p1','p8','p4','p2','p8']
>>> for x in ind:
... newind.append(x.replace('p',''))
>>> newind
['5', '1', '8', '4', '2', '8']
回答by cs95
When there are multiple substrings to remove, one simple and effective option is to use re.sub
with a compiled pattern that involves joining all the substrings-to-remove using the regex OR (|
) pipe.
当要删除多个子字符串时,一个简单而有效的选择是使用re.sub
编译模式,该模式涉及使用正则表达式 OR ( |
) 管道连接所有要删除的子字符串。
import re
to_remove = ['.good', '.bad']
strings = ['Apple.good','Orange.good','Pear.bad']
p = re.compile('|'.join(map(re.escape, to_remove))) # escape to handle metachars
[p.sub('', s) for s in strings]
# ['Apple', 'Orange', 'Pear']
回答by rsc05
If list
如果列出
I was doing something for a list which is a set of strings and you want to remove all lines that have a certain substring you can do this
我正在为一个列表做一些事情,它是一组字符串,你想删除所有具有某个子字符串的行,你可以这样做
import re
def RemoveInList(sub,LinSplitUnOr):
indices = [i for i, x in enumerate(LinSplitUnOr) if re.search(sub, x)]
A = [i for j, i in enumerate(LinSplitUnOr) if j not in indices]
return A
where sub
is a patter that you do not wish to have in a list of lines LinSplitUnOr
sub
您不希望在行列表中出现的模式在哪里LinSplitUnOr
for example
例如
A=['Apple.good','Orange.good','Pear.bad','Pear.good','Banana.bad','Potato.bad']
sub = 'good'
A=RemoveInList(sub,A)
Then A
will be
然后A
将是