Pandas DataFrame 的条件计算列

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/40134313/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 02:14:49  来源:igfitidea点击:

Conditionally calculated column for a Pandas DataFrame

pythonpandasdataframe

提问by Edward J. Stembler

I have a calculated column in a Pandas DataFrame which needs to be assigned base upon a condition. For example:

我在 Pandas DataFrame 中有一个计算列,需要根据条件进行分配。例如:

if(data['column_a'] == 0):
    data['column_c'] = 0
else:
    data['column_c'] = data['column_b']

However, that returns an error:

但是,这会返回错误:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

I have a feeling this has something to do with the fact that is must be done in a matrix style. Changing the code to a ternary statement doesn't work either:

我有一种感觉,这与必须以矩阵样式完成的事实有关。将代码更改为三元语句也不起作用:

data['column_c'] = 0 if data['column_a'] == 0 else data['column_b']

Anyone know the proper way to achieve this? Using apply with a lambda? I could iterate via a loop, but I'd rather keep this the preferred Pandas way.

有谁知道实现这一目标的正确方法?将 apply 与 lambda 一起使用?我可以通过循环进行迭代,但我宁愿保持这种首选的 Pandas 方式。

回答by EdChum

You can do:

你可以做:

data['column_c'] = data['column_a'].where(data['column_a'] == 0, data['column_b'])

this is vectorised your attempts failed because the comparison with ifdoesn't understand how to treat an array of boolean values hence the error

这是矢量化的,您的尝试失败了,因为与 的比较if不了解如何处理布尔值数组,因此出现错误

Example:

例子:

In [81]:
df = pd.DataFrame(np.random.randn(5,3), columns=list('abc'))
df

Out[81]:
          a         b         c
0 -1.065074 -1.294718  0.165750
1 -0.041167  0.962203  0.741852
2  0.714889  0.056171  1.197534
3  0.741988  0.836636 -0.660314
4  0.074554 -1.246847  0.183654

In [82]:
df['d'] = df['b'].where(df['b'] < 0, df['c'])
df

Out[82]:
          a         b         c         d
0 -1.065074 -1.294718  0.165750 -1.294718
1 -0.041167  0.962203  0.741852  0.741852
2  0.714889  0.056171  1.197534  1.197534
3  0.741988  0.836636 -0.660314 -0.660314
4  0.074554 -1.246847  0.183654 -1.246847

回答by Hrishikesh Goyal

use where() and notnull()

使用 where() 和 notnull()

   data['column_c'] = data['column_b'].where(data['column_a'].notnull(), 0)