Pandas 月度滚动操作

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/43556344/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 03:27:15  来源:igfitidea点击:

Pandas monthly rolling operation

pythonpandas

提问by Filip Kilibarda

I ended up figuring it out while writing out this question so I'll just post anyway and answer my own question in case someone else needs a little help.

我最终在写出这个问题时弄清楚了,所以无论如何我都会发布并回答我自己的问题,以防其他人需要一点帮助。

Problem

问题

Suppose we have a DataFrame, df, containing this data.

假设我们有一个DataFrame, df, 包含这些数据。

import pandas as pd
from io import StringIO

data = StringIO(
"""\
date          spendings  category
2014-03-25    10         A
2014-04-05    20         A
2014-04-15    10         A
2014-04-25    10         B
2014-05-05    10         B
2014-05-15    10         A
2014-05-25    10         A
"""
)

df = pd.read_csv(data,sep="\s+",parse_dates=True,index_col="date")

Goal

目标

For each row, sum the spendingsover every row that is within one monthof it, ideally using DataFrame.rollingas it's a very clean syntax.

对于每一行,对它一个月spendings内的每一行求和,最好使用,因为它是一种非常干净的语法。DataFrame.rolling

What I have tried

我试过的

df = df.rolling("M").sum()

But this throws an exception

但这会引发异常

ValueError: <MonthEnd> is a non-fixed frequency

version: pandas==0.19.2

版本: pandas==0.19.2

采纳答案by Filip Kilibarda

Use the "D"offset rather than "M"and specifically use "30D"for 30 days or approximately one month.

使用"D"偏移量而不是"M"专门使用"30D"30 天或大约 1 个月。

df = df.rolling("30D").sum()

Initially, I intuitively jumped to using "M"as I figured it stands for one month, but now it's clear why that doesn't work.

最初,我直觉地跳到使用,"M"因为我认为它代表一个月,但现在很清楚为什么这不起作用。

回答by Mike

To address why you cannot use things like "AS" or "Y", in this case, "Y" offset is not "a year", it is actually referencing YearEnd (http://pandas.pydata.org/pandas-docs/stable/timeseries.html#offset-aliases), and therefore the rolling function does not get a fixed window (e.g. you get a 365 day window if your index falls on Jan 1, and 1 day if Dec 31).

为了解决为什么不能使用“AS”或“Y”之类的东西,在这种情况下,“Y”偏移量不是“一年”,它实际上是指 YearEnd ( http://pandas.pydata.org/pandas-docs /stable/timeseries.html#offset-aliases),因此滚动函数没有固定的窗口(例如,如果您的指数在 1 月 1 日下跌,您将获得 365 天的窗口,如果在 12 月 31 日下跌,则为 1 天)。

The proposed solution (offset by 30D) works if you do not need strict calendar months. Alternatively, you would iterate over your date index, and slice with an offset to get more precise control over your sum.

如果您不需要严格的日历月,则建议的解决方案(偏移 30D)有效。或者,您可以迭代日期索引,并使用偏移量切片以更精确地控制总和。

If you have to do it in one line (separated for readability):

如果您必须在一行中完成(为了便于阅读而分开):

df['Sum'] = [
    df.loc[
        edt - pd.tseries.offsets.DateOffset(months=1):edt, 'spendings'
    ].sum() for edt in df.index
]
spendings   category    Sum
date            
2014-03-25  10  A   10
2014-04-05  20  A   30
2014-04-15  10  A   40
2014-04-25  10  B   50
2014-05-05  10  B   50
2014-05-15  10  A   40
2014-05-25  10  A   40