Python - GroupBy 对象的滚动函数

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/13996302/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 10:08:03  来源:igfitidea点击:

Python - rolling functions for GroupBy object

pythonpandaspandas-groupbyrolling-computationrolling-sum

提问by

I have a time series object groupedof the type <pandas.core.groupby.SeriesGroupBy object at 0x03F1A9F0>. grouped.sum()gives the desired result but I cannot get rolling_sum to work with the groupbyobject. Is there any way to apply rolling functions to groupbyobjects? For example:

我有一个grouped类型的时间序列对象<pandas.core.groupby.SeriesGroupBy object at 0x03F1A9F0>grouped.sum()给出了想要的结果,但我无法让 rolling_sum 与groupby对象一起工作。有没有办法将滚动功能应用于groupby对象?例如:

x = range(0, 6)
id = ['a', 'a', 'a', 'b', 'b', 'b']
df = DataFrame(zip(id, x), columns = ['id', 'x'])
df.groupby('id').sum()
id    x
a    3
b   12

However, I would like to have something like:

但是,我想要一些类似的东西:

  id  x
0  a  0
1  a  1
2  a  3
3  b  3
4  b  7
5  b  12

采纳答案by Garrett

Note:as identified by @kekert, the following pandas pattern has been deprecated. See current solutions in the answers below.

注意:正如@kekert 所指出的,以下熊猫模式已被弃用。请参阅以下答案中的当前解决方案。

In [16]: df.groupby('id')['x'].apply(pd.rolling_mean, 2, min_periods=1)
Out[16]: 
0    0.0
1    0.5
2    1.5
3    3.0
4    3.5
5    4.5

In [17]: df.groupby('id')['x'].cumsum()
Out[17]: 
0     0
1     1
2     3
3     3
4     7
5    12

回答by Zelazny7

I'm not sure of the mechanics, but this works. Note, the returned value is just an ndarray. I think you could apply any cumulative or "rolling" function in this manner and it should have the same result.

我不确定机制,但这有效。请注意,返回的值只是一个 ndarray。我认为您可以以这种方式应用任何累积或“滚动”功能,并且应该具有相同的结果。

I have tested it with cumprod, cummaxand cumminand they all returned an ndarray. I think pandas is smart enough to know that these functions return a series and so the function is applied as a transformation rather than an aggregation.

我曾与测试它cumprodcummax并且cummin他们都返回ndarray。我认为 pandas 足够聪明,知道这些函数返回一个序列,因此该函数被用作转换而不是聚合。

In [35]: df.groupby('id')['x'].cumsum()
Out[35]:
0     0
1     1
2     3
3     3
4     7
5    12

Edit: I found it curious that this syntax does return a Series:

编辑:我发现这个语法确实返回一个系列很奇怪:

In [54]: df.groupby('id')['x'].transform('cumsum')
Out[54]:
0     0
1     1
2     3
3     3
4     7
5    12
Name: x

回答by Kevin Wang

For the Googlers who come upon this old question:

对于遇到这个老问题的谷歌员工:

Regarding @kekert's comment on @Garrett's answer to use the new

关于@kekert 对@Garrett 使用新的回答的评论

df.groupby('id')['x'].rolling(2).mean()

rather than the now-deprecated

而不是现在已弃用的

df.groupby('id')['x'].apply(pd.rolling_mean, 2, min_periods=1)

curiously, it seems that the new .rolling().mean() approach returns a multi-indexed series, indexed by the group_by column first and then the index. Whereas, the old approach would simply return a series indexed singularly by the original df index, which perhaps makes less sense, but made it very convenient for adding that series as a new column into the original dataframe.

奇怪的是,新的 .rolling().mean() 方法似乎返回一个多索引系列,首先由 group_by 列索引,然后是索引。然而,旧方法只会返回由原始 df 索引单独索引的系列,这可能不太有意义,但可以非常方便地将该系列作为新列添加到原始数据帧中。

So I think I've figured out a solution that uses the new rolling() method and still works the same:

所以我想我已经找到了一个使用新的滚动()方法并且仍然工作相同的解决方案:

df.groupby('id')['x'].rolling(2).mean().reset_index(0,drop=True)

which should give you the series

这应该给你系列

0    0.0
1    0.5
2    1.5
3    3.0
4    3.5
5    4.5

which you can add as a column:

您可以将其添加为列:

df['x'] = df.groupby('id')['x'].rolling(2).mean().reset_index(0,drop=True)

回答by Sean McCarthy

Here is another way that generalizes well and uses pandas' expandingmethod.

这是另一种可以很好地概括并使用熊猫的扩展方法的方法。

It is very efficient and also works perfectly for rolling window calculationswith fixed windows, such as for time series.

它非常有效,也非常适合具有固定窗口的滚动窗口计算,例如时间序列。

# Import pandas library
import pandas as pd

# Prepare columns
x = range(0, 6)
id = ['a', 'a', 'a', 'b', 'b', 'b']

# Create dataframe from columns above
df = pd.DataFrame({'id':id, 'x':x})

# Calculate rolling sum with infinite window size (i.e. all rows in group) using "expanding"
df['rolling_sum'] = df.groupby('id')['x'].transform(lambda x: x.expanding().sum())

# Output as desired by original poster
print(df)
  id  x  rolling_sum
0  a  0            0
1  a  1            1
2  a  2            3
3  b  3            3
4  b  4            7
5  b  5           12