Pandas 月度滚动操作
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/43556344/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas monthly rolling operation
提问by Filip Kilibarda
I ended up figuring it out while writing out this question so I'll just post anyway and answer my own question in case someone else needs a little help.
我最终在写出这个问题时弄清楚了,所以无论如何我都会发布并回答我自己的问题,以防其他人需要一点帮助。
Problem
问题
Suppose we have a DataFrame
, df
, containing this data.
假设我们有一个DataFrame
, df
, 包含这些数据。
import pandas as pd
from io import StringIO
data = StringIO(
"""\
date spendings category
2014-03-25 10 A
2014-04-05 20 A
2014-04-15 10 A
2014-04-25 10 B
2014-05-05 10 B
2014-05-15 10 A
2014-05-25 10 A
"""
)
df = pd.read_csv(data,sep="\s+",parse_dates=True,index_col="date")
Goal
目标
For each row, sum the spendings
over every row that is within one monthof it, ideally using DataFrame.rolling
as it's a very clean syntax.
对于每一行,对它一个月spendings
内的每一行求和,最好使用,因为它是一种非常干净的语法。DataFrame.rolling
What I have tried
我试过的
df = df.rolling("M").sum()
But this throws an exception
但这会引发异常
ValueError: <MonthEnd> is a non-fixed frequency
version: pandas==0.19.2
版本: pandas==0.19.2
采纳答案by Filip Kilibarda
Use the "D"
offset rather than "M"
and specifically use "30D"
for 30 days or approximately one month.
使用"D"
偏移量而不是"M"
专门使用"30D"
30 天或大约 1 个月。
df = df.rolling("30D").sum()
Initially, I intuitively jumped to using "M"
as I figured it stands for one month, but now it's clear why that doesn't work.
最初,我直觉地跳到使用,"M"
因为我认为它代表一个月,但现在很清楚为什么这不起作用。
回答by Mike
To address why you cannot use things like "AS" or "Y", in this case, "Y" offset is not "a year", it is actually referencing YearEnd (http://pandas.pydata.org/pandas-docs/stable/timeseries.html#offset-aliases), and therefore the rolling function does not get a fixed window (e.g. you get a 365 day window if your index falls on Jan 1, and 1 day if Dec 31).
为了解决为什么不能使用“AS”或“Y”之类的东西,在这种情况下,“Y”偏移量不是“一年”,它实际上是指 YearEnd ( http://pandas.pydata.org/pandas-docs /stable/timeseries.html#offset-aliases),因此滚动函数没有固定的窗口(例如,如果您的指数在 1 月 1 日下跌,您将获得 365 天的窗口,如果在 12 月 31 日下跌,则为 1 天)。
The proposed solution (offset by 30D) works if you do not need strict calendar months. Alternatively, you would iterate over your date index, and slice with an offset to get more precise control over your sum.
如果您不需要严格的日历月,则建议的解决方案(偏移 30D)有效。或者,您可以迭代日期索引,并使用偏移量切片以更精确地控制总和。
If you have to do it in one line (separated for readability):
如果您必须在一行中完成(为了便于阅读而分开):
df['Sum'] = [
df.loc[
edt - pd.tseries.offsets.DateOffset(months=1):edt, 'spendings'
].sum() for edt in df.index
]
spendings category Sum
date
2014-03-25 10 A 10
2014-04-05 20 A 30
2014-04-15 10 A 40
2014-04-25 10 B 50
2014-05-05 10 B 50
2014-05-15 10 A 40
2014-05-25 10 A 40