Pandas 月度滚动操作

Question

提问by Filip Kilibarda

I ended up figuring it out while writing out this question so I'll just post anyway and answer my own question in case someone else needs a little help.

我最终在写出这个问题时弄清楚了，所以无论如何我都会发布并回答我自己的问题，以防其他人需要一点帮助。

Problem

问题

Suppose we have a DataFrame, df, containing this data.

假设我们有一个DataFrame, df, 包含这些数据。

import pandas as pd
from io import StringIO

data = StringIO(
"""\
date          spendings  category
2014-03-25    10         A
2014-04-05    20         A
2014-04-15    10         A
2014-04-25    10         B
2014-05-05    10         B
2014-05-15    10         A
2014-05-25    10         A
"""
)

df = pd.read_csv(data,sep="\s+",parse_dates=True,index_col="date")

Goal

目标

For each row, sum the spendingsover every row that is within one monthof it, ideally using DataFrame.rollingas it's a very clean syntax.

对于每一行，对它一个月spendings内的每一行求和，最好使用，因为它是一种非常干净的语法。DataFrame.rolling

What I have tried

我试过的

df = df.rolling("M").sum()

But this throws an exception

但这会引发异常

ValueError: <MonthEnd> is a non-fixed frequency

version: pandas==0.19.2

版本： pandas==0.19.2

Answer 1

采纳答案by Filip Kilibarda

Use the "D"offset rather than "M"and specifically use "30D"for 30 days or approximately one month.

使用"D"偏移量而不是"M"专门使用"30D"30 天或大约 1 个月。

df = df.rolling("30D").sum()

Initially, I intuitively jumped to using "M"as I figured it stands for one month, but now it's clear why that doesn't work.

最初，我直觉地跳到使用，"M"因为我认为它代表一个月，但现在很清楚为什么这不起作用。

Answer 2

回答by Mike

To address why you cannot use things like "AS" or "Y", in this case, "Y" offset is not "a year", it is actually referencing YearEnd (http://pandas.pydata.org/pandas-docs/stable/timeseries.html#offset-aliases), and therefore the rolling function does not get a fixed window (e.g. you get a 365 day window if your index falls on Jan 1, and 1 day if Dec 31).

为了解决为什么不能使用“AS”或“Y”之类的东西，在这种情况下，“Y”偏移量不是“一年”，它实际上是指 YearEnd ( http://pandas.pydata.org/pandas-docs /stable/timeseries.html#offset-aliases），因此滚动函数没有固定的窗口（例如，如果您的指数在 1 月 1 日下跌，您将获得 365 天的窗口，如果在 12 月 31 日下跌，则为 1 天）。

The proposed solution (offset by 30D) works if you do not need strict calendar months. Alternatively, you would iterate over your date index, and slice with an offset to get more precise control over your sum.

如果您不需要严格的日历月，则建议的解决方案（偏移 30D）有效。或者，您可以迭代日期索引，并使用偏移量切片以更精确地控制总和。

If you have to do it in one line (separated for readability):

如果您必须在一行中完成（为了便于阅读而分开）：

df['Sum'] = [
    df.loc[
        edt - pd.tseries.offsets.DateOffset(months=1):edt, 'spendings'
    ].sum() for edt in df.index
]
spendings   category    Sum
date            
2014-03-25  10  A   10
2014-04-05  20  A   30
2014-04-15  10  A   40
2014-04-25  10  B   50
2014-05-05  10  B   50
2014-05-15  10  A   40
2014-05-25  10  A   40

Pandas 月度滚动操作

提问by Filip Kilibarda

Problem

问题

Goal

目标

What I have tried

我试过的

采纳答案by Filip Kilibarda

回答by Mike

相关推荐

最近更新

标签

Pandas 月度滚动操作

提问by Filip Kilibarda

Problem

问题

Goal

目标

What I have tried

我试过的

采纳答案by Filip Kilibarda

回答by Mike

相关推荐

pandas 在 Python 中使用 data.info() 显示所有信息

解析 Pandas 数据框

pandas 将熊猫数据框列转换为数字的更好方法

Pandas：如何在不使用 scikit 的情况下进行交叉验证？

相关推荐

最近更新

标签