Pandas DataFrame 的条件计算列

Question

提问by Edward J. Stembler

I have a calculated column in a Pandas DataFrame which needs to be assigned base upon a condition. For example:

我在 Pandas DataFrame 中有一个计算列，需要根据条件进行分配。例如：

if(data['column_a'] == 0):
    data['column_c'] = 0
else:
    data['column_c'] = data['column_b']

However, that returns an error:

但是，这会返回错误：

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

I have a feeling this has something to do with the fact that is must be done in a matrix style. Changing the code to a ternary statement doesn't work either:

我有一种感觉，这与必须以矩阵样式完成的事实有关。将代码更改为三元语句也不起作用：

data['column_c'] = 0 if data['column_a'] == 0 else data['column_b']

Anyone know the proper way to achieve this? Using apply with a lambda? I could iterate via a loop, but I'd rather keep this the preferred Pandas way.

有谁知道实现这一目标的正确方法？将 apply 与 lambda 一起使用？我可以通过循环进行迭代，但我宁愿保持这种首选的 Pandas 方式。

Answer 1

回答by EdChum

You can do:

你可以做：

data['column_c'] = data['column_a'].where(data['column_a'] == 0, data['column_b'])

this is vectorised your attempts failed because the comparison with ifdoesn't understand how to treat an array of boolean values hence the error

这是矢量化的，您的尝试失败了，因为与的比较if不了解如何处理布尔值数组，因此出现错误

Example:

例子：

In [81]:
df = pd.DataFrame(np.random.randn(5,3), columns=list('abc'))
df

Out[81]:
          a         b         c
0 -1.065074 -1.294718  0.165750
1 -0.041167  0.962203  0.741852
2  0.714889  0.056171  1.197534
3  0.741988  0.836636 -0.660314
4  0.074554 -1.246847  0.183654

In [82]:
df['d'] = df['b'].where(df['b'] < 0, df['c'])
df

Out[82]:
          a         b         c         d
0 -1.065074 -1.294718  0.165750 -1.294718
1 -0.041167  0.962203  0.741852  0.741852
2  0.714889  0.056171  1.197534  1.197534
3  0.741988  0.836636 -0.660314 -0.660314
4  0.074554 -1.246847  0.183654 -1.246847

Answer 2

回答by Hrishikesh Goyal

use where() and notnull()

使用 where() 和 notnull()

   data['column_c'] = data['column_b'].where(data['column_a'].notnull(), 0)

Pandas DataFrame 的条件计算列

提问by Edward J. Stembler

回答by EdChum

回答by Hrishikesh Goyal

相关推荐

最近更新

标签

Pandas DataFrame 的条件计算列

提问by Edward J. Stembler

回答by EdChum

回答by Hrishikesh Goyal

相关推荐

pandas：获取一行索引的值？

pandas 如何根据条目的长度过滤熊猫数据框

pandas 如何用dask映射一列

在 Pandas 中，当使用 read_csv() 时，如何将 NaN 分配给不是 dtype 预期的值？

相关推荐

最近更新

标签