pandas 在pandas中使用groupby时如何分别求和负值和正值?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/33780794/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to sum negative and positive values separately when using groupby in pandas?
提问by ALH
How to sum positive and negative values differently in pandas
and put them let's say in positive
and negative
columns?
如何以不同的方式对正值和负值求和pandas
并将它们放入positive
和negative
列中?
I have this dataframe like below:
我有这个数据框,如下所示:
df = pandas.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'foo'],
'B' : ['one', 'one', 'two', 'three', 'two', 'two', 'one', 'three'],
'C' : np.random.randn(8), 'D' : np.random.randn(8)})
Output is as below:
输出如下:
df
A B C D
0 foo one 0.374156 0.319699
1 bar one -0.356339 -0.629649
2 foo two -0.390243 -1.387909
3 bar three -0.783435 -0.959699
4 foo two -1.268622 -0.250871
5 bar two -2.302525 -1.295991
6 foo one -0.968840 1.247675
7 foo three 0.482845 1.004697
I used the below code to get negatives:
我使用下面的代码得到否定:
df['negative'] = df.groupby('A')['C'].apply(lambda x: x[x<0].sum()).reset_index()]
But the problem is when I want to add it to one of dataframe
columns called negative
it gives error:
但问题是当我想将它添加到dataframe
名为negative
它的列之一时会出现错误:
ValueError: Wrong number of items passed 2, placement implies 1
Again I know what it says that groupby
has returned more than one column and cannot assign it to df['negatives']
but I don't know how to solve this part of the problem. I need to have positive col too.
我再次知道它说的是什么groupby
返回了不止一列并且无法将其分配给df['negatives']
但我不知道如何解决这部分问题。我也需要阳性结肠。
The desired outcome would be:
期望的结果是:
A Positive Negative
0 foo 0.374156 -0.319699
1 bar 0.356339 -0.629649
What is the right solution to the problem?
问题的正确解决方案是什么?
回答by Nader Hisham
In [14]:
df.groupby(df['A'])['C'].agg([('negative' , lambda x : x[x < 0].sum()) , ('positive' , lambda x : x[x > 0].sum())])
Out[14]:
negative positive
A
bar -1.418788 2.603452
foo -0.504695 2.880512
回答by behzad.nouri
You may groupby
on A
and df['C'] > 0
, and unstack
the result:
您可以groupby
打开A
和df['C'] > 0
,unstack
结果:
>>> right = df.groupby(['A', df['C'] > 0])['C'].sum().unstack()
>>> right = right.rename(columns={True:'positive', False:'negative'})
>>> right
C negative positive
A
bar -3.4423 NaN
foo -2.6277 0.857
The NaN
value is because all the A == bar
rows have negative value for C
.
该NaN
值是因为所有A == bar
行都具有负值C
。
if you want to add these to the original frame corresponding to values of groupby
key, i.e. A
, it would require a left join
:
如果您想将这些添加到与groupby
键值对应的原始帧中,即A
,它需要一个 left join
:
>>> df.join(right, on='A', how='left')
A B C D negative positive
0 foo one 0.3742 0.3197 -2.6277 0.857
1 bar one -0.3563 -0.6296 -3.4423 NaN
2 foo two -0.3902 -1.3879 -2.6277 0.857
3 bar three -0.7834 -0.9597 -3.4423 NaN
4 foo two -1.2686 -0.2509 -2.6277 0.857
5 bar two -2.3025 -1.2960 -3.4423 NaN
6 foo one -0.9688 1.2477 -2.6277 0.857
7 foo three 0.4828 1.0047 -2.6277 0.857