Pandas DataFrame 的条件计算列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/40134313/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Conditionally calculated column for a Pandas DataFrame
提问by Edward J. Stembler
I have a calculated column in a Pandas DataFrame which needs to be assigned base upon a condition. For example:
我在 Pandas DataFrame 中有一个计算列,需要根据条件进行分配。例如:
if(data['column_a'] == 0):
data['column_c'] = 0
else:
data['column_c'] = data['column_b']
However, that returns an error:
但是,这会返回错误:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
I have a feeling this has something to do with the fact that is must be done in a matrix style. Changing the code to a ternary statement doesn't work either:
我有一种感觉,这与必须以矩阵样式完成的事实有关。将代码更改为三元语句也不起作用:
data['column_c'] = 0 if data['column_a'] == 0 else data['column_b']
Anyone know the proper way to achieve this? Using apply with a lambda? I could iterate via a loop, but I'd rather keep this the preferred Pandas way.
有谁知道实现这一目标的正确方法?将 apply 与 lambda 一起使用?我可以通过循环进行迭代,但我宁愿保持这种首选的 Pandas 方式。
回答by EdChum
You can do:
你可以做:
data['column_c'] = data['column_a'].where(data['column_a'] == 0, data['column_b'])
this is vectorised your attempts failed because the comparison with if
doesn't understand how to treat an array of boolean values hence the error
这是矢量化的,您的尝试失败了,因为与 的比较if
不了解如何处理布尔值数组,因此出现错误
Example:
例子:
In [81]:
df = pd.DataFrame(np.random.randn(5,3), columns=list('abc'))
df
Out[81]:
a b c
0 -1.065074 -1.294718 0.165750
1 -0.041167 0.962203 0.741852
2 0.714889 0.056171 1.197534
3 0.741988 0.836636 -0.660314
4 0.074554 -1.246847 0.183654
In [82]:
df['d'] = df['b'].where(df['b'] < 0, df['c'])
df
Out[82]:
a b c d
0 -1.065074 -1.294718 0.165750 -1.294718
1 -0.041167 0.962203 0.741852 0.741852
2 0.714889 0.056171 1.197534 1.197534
3 0.741988 0.836636 -0.660314 -0.660314
4 0.074554 -1.246847 0.183654 -1.246847
回答by Hrishikesh Goyal
use where() and notnull()
使用 where() 和 notnull()
data['column_c'] = data['column_b'].where(data['column_a'].notnull(), 0)