Python Pandas groupby 累计总和

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/22650833/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 01:21:47  来源:igfitidea点击:

Pandas groupby cumulative sum

pythonpandas

提问by kc2819

I would like to add a cumulative sum column to my Pandas dataframe so that:

我想在我的 Pandas 数据框中添加一个累积总和列,以便:

name | day       | no
-----|-----------|----
Hyman | Monday    | 10
Hyman | Tuesday   | 20
Hyman | Tuesday   | 10
Hyman | Wednesday | 50
Jill | Monday    | 40
Jill | Wednesday | 110

becomes:

变成:

Hyman | Monday     | 10  | 10
Hyman | Tuesday    | 30  | 40
Hyman | Wednesday  | 50  | 90
Jill | Monday     | 40  | 40
Jill | Wednesday  | 110 | 150

I tried various combos of df.groupbyand df.agg(lambda x: cumsum(x))to no avail.

我试过各种连击df.groupbydf.agg(lambda x: cumsum(x))无济于事。

回答by CT Zhu

This should do it, need groupby()twice:

这应该可以,需要groupby()两次:

df.groupby(['name', 'day']).sum() \
  .groupby(level=0).cumsum().reset_index()

Explanation:

解释:

print(df)
   name        day   no
0  Hyman     Monday   10
1  Hyman    Tuesday   20
2  Hyman    Tuesday   10
3  Hyman  Wednesday   50
4  Jill     Monday   40
5  Jill  Wednesday  110

# sum per name/day
print( df.groupby(['name', 'day']).sum() )
                 no
name day           
Hyman Monday      10
     Tuesday     30
     Wednesday   50
Jill Monday      40
      Wednesday  110

# cumulative sum per name/day
print( df.groupby(['name', 'day']).sum() \
         .groupby(level=0).cumsum() )
                 no
name day           
Hyman Monday      10
     Tuesday     40
     Wednesday   90
Jill Monday      40
     Wednesday  150

The dataframe resulting from the first sum is indexed by 'name'and by 'day'. You can see it by printing

由第一个总和产生的数据帧由'name'和索引'day'。你可以通过打印看到它

df.groupby(['name', 'day']).sum().index 

When computing the cumulative sum, you want to do so by 'name', corresponding to the first index (level 0).

在计算累积总和时,您希望通过'name',对应于第一个索引(级别 0)来执行此操作。

Finally, use reset_indexto have the names repeated.

最后,使用reset_index使名称重复。

df.groupby(['name', 'day']).sum().groupby(level=0).cumsum().reset_index()

   name        day   no
0  Hyman     Monday   10
1  Hyman    Tuesday   40
2  Hyman  Wednesday   90
3  Jill     Monday   40
4  Jill  Wednesday  150

回答by Dmitry Andreev

This works in pandas 0.16.2

这适用于熊猫 0.16.2

In[23]: print df
        name          day   no
0      Hyman       Monday    10
1      Hyman      Tuesday    20
2      Hyman      Tuesday    10
3      Hyman    Wednesday    50
4      Jill       Monday    40
5      Jill    Wednesday   110
In[24]: df['no_cumulative'] = df.groupby(['name'])['no'].apply(lambda x: x.cumsum())
In[25]: print df
        name          day   no  no_cumulative
0      Hyman       Monday    10             10
1      Hyman      Tuesday    20             30
2      Hyman      Tuesday    10             40
3      Hyman    Wednesday    50             90
4      Jill       Monday    40             40
5      Jill    Wednesday   110            150

回答by sushmit

you should use

你应该使用

df['cum_no'] = df.no.cumsum()

http://pandas.pydata.org/pandas-docs/version/0.19.2/generated/pandas.DataFrame.cumsum.html

http://pandas.pydata.org/pandas-docs/version/0.19.2/generated/pandas.DataFrame.cumsum.html

Another way of doing it

另一种方法

import pandas as pd
df = pd.DataFrame({'C1' : ['a','a','a','b','b'],
           'C2' : [1,2,3,4,5]})
df['cumsum'] = df.groupby(by=['C1'])['C2'].transform(lambda x: x.cumsum())
df

enter image description here

在此处输入图片说明

回答by Christoph

Instead of df.groupby(by=['name','day']).sum().groupby(level=[0]).cumsum()(see above) you could also do a df.set_index(['name', 'day']).groupby(level=0, as_index=False).cumsum()

除了df.groupby(by=['name','day']).sum().groupby(level=[0]).cumsum()(见上文),你也可以做一个df.set_index(['name', 'day']).groupby(level=0, as_index=False).cumsum()

  • df.groupby(by=['name','day']).sum()is actually just moving both columns to a MultiIndex
  • as_index=Falsemeans you do not need to call reset_index afterwards
  • df.groupby(by=['name','day']).sum()实际上只是将两列移动到 MultiIndex
  • as_index=False意味着您之后不需要调用 reset_index

回答by vjayky

Modification to @Dmitry's answer. This is simpler and works in pandas 0.19.0:

修改@Dmitry 的回答。这更简单,适用于 Pandas 0.19.0:

print(df) 

 name        day   no
0  Hyman     Monday   10
1  Hyman    Tuesday   20
2  Hyman    Tuesday   10
3  Hyman  Wednesday   50
4  Jill     Monday   40
5  Jill  Wednesday  110

df['no_csum'] = df.groupby(['name'])['no'].cumsum()

print(df)
   name        day   no  no_csum
0  Hyman     Monday   10       10
1  Hyman    Tuesday   20       30
2  Hyman    Tuesday   10       40
3  Hyman  Wednesday   50       90
4  Jill     Monday   40       40
5  Jill  Wednesday  110      150