Python Pandas groupby 累计总和
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/22650833/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas groupby cumulative sum
提问by kc2819
I would like to add a cumulative sum column to my Pandas dataframe so that:
我想在我的 Pandas 数据框中添加一个累积总和列,以便:
name | day | no
-----|-----------|----
Hyman | Monday | 10
Hyman | Tuesday | 20
Hyman | Tuesday | 10
Hyman | Wednesday | 50
Jill | Monday | 40
Jill | Wednesday | 110
becomes:
变成:
Hyman | Monday | 10 | 10
Hyman | Tuesday | 30 | 40
Hyman | Wednesday | 50 | 90
Jill | Monday | 40 | 40
Jill | Wednesday | 110 | 150
I tried various combos of df.groupbyand df.agg(lambda x: cumsum(x))to no avail.
我试过各种连击df.groupby和df.agg(lambda x: cumsum(x))无济于事。
回答by CT Zhu
This should do it, need groupby()twice:
这应该可以,需要groupby()两次:
df.groupby(['name', 'day']).sum() \
.groupby(level=0).cumsum().reset_index()
Explanation:
解释:
print(df)
name day no
0 Hyman Monday 10
1 Hyman Tuesday 20
2 Hyman Tuesday 10
3 Hyman Wednesday 50
4 Jill Monday 40
5 Jill Wednesday 110
# sum per name/day
print( df.groupby(['name', 'day']).sum() )
no
name day
Hyman Monday 10
Tuesday 30
Wednesday 50
Jill Monday 40
Wednesday 110
# cumulative sum per name/day
print( df.groupby(['name', 'day']).sum() \
.groupby(level=0).cumsum() )
no
name day
Hyman Monday 10
Tuesday 40
Wednesday 90
Jill Monday 40
Wednesday 150
The dataframe resulting from the first sum is indexed by 'name'and by 'day'. You can see it by printing
由第一个总和产生的数据帧由'name'和索引'day'。你可以通过打印看到它
df.groupby(['name', 'day']).sum().index
When computing the cumulative sum, you want to do so by 'name', corresponding to the first index (level 0).
在计算累积总和时,您希望通过'name',对应于第一个索引(级别 0)来执行此操作。
Finally, use reset_indexto have the names repeated.
最后,使用reset_index使名称重复。
df.groupby(['name', 'day']).sum().groupby(level=0).cumsum().reset_index()
name day no
0 Hyman Monday 10
1 Hyman Tuesday 40
2 Hyman Wednesday 90
3 Jill Monday 40
4 Jill Wednesday 150
回答by Dmitry Andreev
This works in pandas 0.16.2
这适用于熊猫 0.16.2
In[23]: print df
name day no
0 Hyman Monday 10
1 Hyman Tuesday 20
2 Hyman Tuesday 10
3 Hyman Wednesday 50
4 Jill Monday 40
5 Jill Wednesday 110
In[24]: df['no_cumulative'] = df.groupby(['name'])['no'].apply(lambda x: x.cumsum())
In[25]: print df
name day no no_cumulative
0 Hyman Monday 10 10
1 Hyman Tuesday 20 30
2 Hyman Tuesday 10 40
3 Hyman Wednesday 50 90
4 Jill Monday 40 40
5 Jill Wednesday 110 150
回答by sushmit
you should use
你应该使用
df['cum_no'] = df.no.cumsum()
http://pandas.pydata.org/pandas-docs/version/0.19.2/generated/pandas.DataFrame.cumsum.html
http://pandas.pydata.org/pandas-docs/version/0.19.2/generated/pandas.DataFrame.cumsum.html
Another way of doing it
另一种方法
import pandas as pd
df = pd.DataFrame({'C1' : ['a','a','a','b','b'],
'C2' : [1,2,3,4,5]})
df['cumsum'] = df.groupby(by=['C1'])['C2'].transform(lambda x: x.cumsum())
df
回答by Christoph
Instead of df.groupby(by=['name','day']).sum().groupby(level=[0]).cumsum()(see above) you could also do a df.set_index(['name', 'day']).groupby(level=0, as_index=False).cumsum()
除了df.groupby(by=['name','day']).sum().groupby(level=[0]).cumsum()(见上文),你也可以做一个df.set_index(['name', 'day']).groupby(level=0, as_index=False).cumsum()
df.groupby(by=['name','day']).sum()is actually just moving both columns to a MultiIndexas_index=Falsemeans you do not need to call reset_index afterwards
df.groupby(by=['name','day']).sum()实际上只是将两列移动到 MultiIndexas_index=False意味着您之后不需要调用 reset_index
回答by vjayky
Modification to @Dmitry's answer. This is simpler and works in pandas 0.19.0:
修改@Dmitry 的回答。这更简单,适用于 Pandas 0.19.0:
print(df)
name day no
0 Hyman Monday 10
1 Hyman Tuesday 20
2 Hyman Tuesday 10
3 Hyman Wednesday 50
4 Jill Monday 40
5 Jill Wednesday 110
df['no_csum'] = df.groupby(['name'])['no'].cumsum()
print(df)
name day no no_csum
0 Hyman Monday 10 10
1 Hyman Tuesday 20 30
2 Hyman Tuesday 10 40
3 Hyman Wednesday 50 90
4 Jill Monday 40 40
5 Jill Wednesday 110 150


