如何在 Pandas 的组内使用 cumsum?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/32847800/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 23:56:54  来源:igfitidea点击:

How can I use cumsum within a group in Pandas?

pythonpandasgroup-bydataframecumsum

提问by Baron Yugovich

I have

我有

df = pd.DataFrame.from_dict({'id': ['A', 'B', 'A', 'C', 'D', 'B', 'C'], 'val': [1,2,-3,1,5,6,-2], 'stuff':['12','23232','13','1234','3235','3236','732323']})

  id   stuff  val
0  A      12    1
1  B   23232    2
2  A      13   -3
3  C    1234    1
4  D    3235    5
5  B    3236    6
6  C  732323   -2

I'd like to get running some of valfor each id, so the desired output looks like this:

我想val为 each运行一些id,因此所需的输出如下所示:

  id   stuff  val  cumsum
0  A      12    1   1
1  B   23232    2   2
2  A      13   -3   -2
3  C    1234    1   1
4  D    3235    5   5
5  B    3236    6   8
6  C  732323   -2  -1

This is what I tried:

这是我尝试过的:

df['cumsum'] = df.groupby('id').cumsum(['val'])

and

df['cumsum'] = df.groupby('id').cumsum(['val'])

This is the error I got:

这是我得到的错误:

ValueError: Wrong number of items passed 0, placement implies 1

回答by EdChum

You can call transformand pass the cumsumfunction to add that column to your df:

您可以调用transform并传递cumsum函数将该列添加到您的 df:

In [156]:
df['cumsum'] = df.groupby('id')['val'].transform(pd.Series.cumsum)
df

Out[156]:
  id   stuff  val  cumsum
0  A      12    1       1
1  B   23232    2       2
2  A      13   -3      -2
3  C    1234    1       1
4  D    3235    5       5
5  B    3236    6       8
6  C  732323   -2      -1

With respect to your error, you can't call cumsumon a Series groupby object, secondly you're passing the name of the column as a list which is meaningless.

关于您的错误,您不能调用cumsumSeries groupby 对象,其次您将列的名称作为毫无意义的列表传递。

So this works:

所以这有效:

In [159]:
df.groupby('id')['val'].cumsum()

Out[159]:
0    1
1    2
2   -2
3    1
4    5
5    8
6   -1
dtype: int64