pandas 如何从熊猫数据框中的列中删除字符串值
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/33413249/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to remove string value from column in pandas dataframe
提问by sequence_hard
I am trying to write some code that splits a string in a dataframe column at comma (so it becomes a list) and removes a certain string from that list if it is present. after removing the unwanted string I want to join the list elements again at comma. My dataframe looks like this:
我正在尝试编写一些代码,以逗号分隔数据帧列中的字符串(因此它成为一个列表),并从该列表中删除某个字符串(如果存在)。删除不需要的字符串后,我想以逗号再次加入列表元素。我的数据框如下所示:
df:
Column1 Column2
0 a a,b,c
1 y b,n,m
2 d n,n,m
3 d b,b,x
So basically my goal is to remove all b values from column2 so that I get:
所以基本上我的目标是从 column2 中删除所有 b 值,以便我得到:
df:
df:
Column1 Column2
0 a a,c
1 y n,m
2 d n,n,m
3 d x
The code I have written is the following:
我写的代码如下:
df=df['Column2'].apply(lambda x: x.split(','))
def exclude_b(df):
for index, liste in df['column2].iteritems():
if 'b' in liste:
liste.remove('b')
return liste
else:
return liste
The first row splits all the values in the column into a comma separated list. with the function now I tried to iterate through all the lists and remove the b if present, if it is not present return the list as it is. If I print 'liste' at the end it only returns the first row of Column2, but not the others. What am I doing wrong? And would there be a way to implement my if condition into a lambda function?
第一行将列中的所有值拆分为逗号分隔的列表。现在使用该函数,我尝试遍历所有列表并删除 b(如果存在),如果它不存在,则按原样返回列表。如果我在最后打印“liste”,它只返回 Column2 的第一行,而不返回其他行。我究竟做错了什么?有没有办法将我的 if 条件实现到 lambda 函数中?
回答by Nader Hisham
simply you can apply the regex b,?
, which means replace any value of b
and ,
found after the b
if exists
简单地您可以应用正则表达式b,?
,这意味着替换if 存在 之后找到的b
和 的任何值,
b
df['Column2'] = df.Column2.str.replace('b,?' , '')
Out[238]:
Column1 Column2
0 a a,c
1 y n,m
2 d n,n,m
3 d x