pandas 在pandas中使用groupby时如何分别求和负值和正值?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/33780794/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 00:14:57  来源:igfitidea点击:

How to sum negative and positive values separately when using groupby in pandas?

pythonpandasgroup-bysumdataframe

提问by ALH

How to sum positive and negative values differently in pandasand put them let's say in positiveand negativecolumns?

如何以不同的方式对正值和负值求和pandas并将它们放入positivenegative列中?

I have this dataframe like below:

我有这个数据框,如下所示:

df = pandas.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'foo'],
                   'B' : ['one', 'one', 'two', 'three', 'two', 'two', 'one', 'three'],
                   'C' : np.random.randn(8), 'D' : np.random.randn(8)})

Output is as below:

输出如下:

df
     A      B         C         D
0  foo    one  0.374156  0.319699
1  bar    one -0.356339 -0.629649
2  foo    two -0.390243 -1.387909
3  bar  three -0.783435 -0.959699
4  foo    two -1.268622 -0.250871
5  bar    two -2.302525 -1.295991
6  foo    one -0.968840  1.247675
7  foo  three  0.482845  1.004697

I used the below code to get negatives:

我使用下面的代码得到否定:

df['negative'] = df.groupby('A')['C'].apply(lambda x: x[x<0].sum()).reset_index()]

But the problem is when I want to add it to one of dataframecolumns called negativeit gives error:

但问题是当我想将它添加到dataframe名为negative它的列之一时会出现错误:

ValueError: Wrong number of items passed 2, placement implies 1

Again I know what it says that groupbyhas returned more than one column and cannot assign it to df['negatives']but I don't know how to solve this part of the problem. I need to have positive col too.

我再次知道它说的是什么groupby返回了不止一列并且无法将其分配给df['negatives']但我不知道如何解决这部分问题。我也需要阳性结肠。

The desired outcome would be:

期望的结果是:

    A      Positive   Negative
0  foo     0.374156  -0.319699
1  bar     0.356339  -0.629649

What is the right solution to the problem?

问题的正确解决方案是什么?

回答by Nader Hisham

In [14]:
df.groupby(df['A'])['C'].agg([('negative' , lambda x : x[x < 0].sum()) , ('positive' , lambda x : x[x > 0].sum())])
Out[14]:
     negative   positive
A       
bar -1.418788   2.603452
foo -0.504695   2.880512

回答by behzad.nouri

You may groupbyon Aand df['C'] > 0, and unstackthe result:

您可以groupby打开Adf['C'] > 0unstack结果:

>>> right = df.groupby(['A', df['C'] > 0])['C'].sum().unstack()
>>> right = right.rename(columns={True:'positive', False:'negative'})
>>> right
C    negative  positive
A                      
bar   -3.4423       NaN
foo   -2.6277     0.857

The NaNvalue is because all the A == barrows have negative value for C.

NaN值是因为所有A == bar行都具有负值C

if you want to add these to the original frame corresponding to values of groupbykey, i.e. A, it would require a left join:

如果您想将这些添加到与groupby键值对应的原始帧中,即A,它需要一个 left join

>>> df.join(right, on='A', how='left')
     A      B       C       D  negative  positive
0  foo    one  0.3742  0.3197   -2.6277     0.857
1  bar    one -0.3563 -0.6296   -3.4423       NaN
2  foo    two -0.3902 -1.3879   -2.6277     0.857
3  bar  three -0.7834 -0.9597   -3.4423       NaN
4  foo    two -1.2686 -0.2509   -2.6277     0.857
5  bar    two -2.3025 -1.2960   -3.4423       NaN
6  foo    one -0.9688  1.2477   -2.6277     0.857
7  foo  three  0.4828  1.0047   -2.6277     0.857