Python Pandas:如何根据其他列值的条件对列求和?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/37947641/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 20:07:52  来源:igfitidea点击:

Pandas: How to sum columns based on conditional of other column values?

pythonpandasdataframeconditional

提问by ShanZhengYang

I have the following pandas DataFrame.

我有以下熊猫数据帧。

import pandas as pd
df = pd.read_csv('filename.csv')

print(df)

     dog      A         B           C
0     dog1    0.787575  0.159330    0.053095
1     dog10   0.770698  0.169487    0.059815
2     dog11   0.792689  0.152043    0.055268
3     dog12   0.785066  0.160361    0.054573
4     dog13   0.795455  0.150464    0.054081
5     dog14   0.794873  0.150700    0.054426
..    ....
8     dog19   0.811585  0.140207    0.048208
9     dog2    0.797202  0.152033    0.050765
10    dog20   0.801607  0.145137    0.053256
11    dog21   0.792689  0.152043    0.055268
    ....

I create a new column by summing columns "A", "B", "C"as follows:

我通过对列"A",求和来创建一个新列"B""C"如下所示:

df['total_ABC'] = df[["A", "B", "B"]].sum(axis=1)

Now I would like to do this based on a conditional, i.e. if "A" < 0.78then create a new summed column df['smallA_sum'] = df[["A", "B", "B"]].sum(axis=1). Otherwise, the value should be zero.

现在我想根据条件来执行此操作,即如果"A" < 0.78然后创建一个新的 summed column df['smallA_sum'] = df[["A", "B", "B"]].sum(axis=1)。否则,该值应为零。

How does one create conditional statements like this?

如何创建这样的条件语句?

My thought would be to use

我的想法是使用

df['smallA_sum'] = df1.apply(lambda row: (row['A']+row['B']+row['C']) if row['A'] < 0.78))

However, this doesn't work and I'm not able to specify axis.

但是,这不起作用,我无法指定轴。

How do you create a column based on the values of other columns?

如何根据其他列的值创建列?

You could also do something like for each df['dog'] == 'dog2', create column dog2_sum, i.e.

你也可以为 each 做一些类似的事情df['dog'] == 'dog2',创建列dog2_sum,即

 df['dog2_sum'] = df1.apply(lambda row: (row['A']+row['B']+row['C']) if df['dog'] == 'dog2'))

but my approach is incorrect.

但我的方法是不正确的。

`

`

回答by EdChum

The following should work, here we mask the df where the condition is met, this will set NaNto the rows where the condition isn't met so we call fillnaon the new col:

以下应该有效,在这里我们屏蔽满足条件的 df,这将设置为不满足条件NaN的行,因此我们调用fillna新的 col:

In [67]:
df = pd.DataFrame(np.random.randn(5,3), columns=list('ABC'))
df

Out[67]:
          A         B         C
0  0.197334  0.707852 -0.443475
1 -1.063765 -0.914877  1.585882
2  0.899477  1.064308  1.426789
3 -0.556486 -0.150080 -0.149494
4 -0.035858  0.777523 -0.453747

In [73]:    
df['total'] = df.loc[df['A'] > 0,['A','B']].sum(axis=1)
df['total'].fillna(0, inplace=True)
df

Out[73]:
          A         B         C     total
0  0.197334  0.707852 -0.443475  0.905186
1 -1.063765 -0.914877  1.585882  0.000000
2  0.899477  1.064308  1.426789  1.963785
3 -0.556486 -0.150080 -0.149494  0.000000
4 -0.035858  0.777523 -0.453747  0.000000

Another approach is to call whereon the sumresult, this takes a value param to return when the condition isn't met:

另一种方法是调用wheresum结果,这需要一个值参数去回报,当条件不满足:

In [75]:
df['total'] = df[['A','B']].sum(axis=1).where(df['A'] > 0, 0)
df

Out[75]:
          A         B         C     total
0  0.197334  0.707852 -0.443475  0.905186
1 -1.063765 -0.914877  1.585882  0.000000
2  0.899477  1.064308  1.426789  1.963785
3 -0.556486 -0.150080 -0.149494  0.000000
4 -0.035858  0.777523 -0.453747  0.000000