Pandas 滚动窗口 - datetime64[ns] 未实现
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/38415314/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas Rolling Window - datetime64[ns] are not implemented
提问by David Crook
I'm attempting to use Python/Pandas to build some charts. I have data that is sampled every second. Here is a sample:
我正在尝试使用 Python/Pandas 来构建一些图表。我有每秒采样的数据。这是一个示例:
Index, Time, Value
31362, 1975-05-07 07:59:18, 36.151612
31363, 1975-05-07 07:59:19, 36.181368
31364, 1975-05-07 07:59:20, 36.197195
31365, 1975-05-07 07:59:21, 36.151413
31366, 1975-05-07 07:59:22, 36.138009
31367, 1975-05-07 07:59:23, 36.142962
31368, 1975-05-07 07:59:24, 36.122680
I need to create a variety of windows to look at the data. 10, 100, 1000 etc. Unfortunately when I attempt to window the entire data frame I get the error below...
我需要创建各种窗口来查看数据。10, 100, 1000 等等。不幸的是,当我尝试对整个数据框进行窗口化时,出现以下错误...
NotImplementedError: ops for Rolling for this dtype datetime64[ns] are not implemented
I checked out these docs: http://pandas.pydata.org/pandas-docs/stable/computation.htmlas a reference, and they appear to be doing this on date ranges. I did notice that the data type between what they have and what I have is different.
我查看了这些文档:http: //pandas.pydata.org/pandas-docs/stable/computation.html作为参考,他们似乎在日期范围内这样做。我确实注意到他们拥有的和我拥有的数据类型不同。
Is there an easy way to do this?
是否有捷径可寻?
This is ideally what I'm trying to do:
理想情况下,这就是我想要做的:
tmp = data.rolling(window=2)
tmp.mean()
I'm using plotly to plot the raw data and then the windowed data on top of it. My goal is to find ideal windows for identifying cleaner trends in the data removing some of the noise.
我正在使用 plotly 绘制原始数据,然后在其上绘制窗口数据。我的目标是找到理想的窗口,以识别数据中的更清晰趋势并消除一些噪音。
Thanks!
谢谢!
Additional Notes:
补充说明:
I think I need to take my data from this format:
我想我需要从这种格式中获取我的数据:
pandas.core.series.Series to this one:
pandas.core.series.Series 到这个:
pandas.tseries.index.DatetimeIndex
pandas.tseries.index.DatetimeIndex
回答by piRSquared
Setup
设置
from StringIO import StringIO
import pandas as pd
text = """Index,Time,Value
31362,1975-05-07 07:59:18,36.151612
31363,1975-05-07 07:59:19,36.181368
31364,1975-05-07 07:59:20,36.197195
31365,1975-05-07 07:59:21,36.151413
31366,1975-05-07 07:59:22,36.138009
31367,1975-05-07 07:59:23,36.142962
31368,1975-05-07 07:59:24,36.122680"""
df = pd.read_csv(StringIO(text), index_col=0, parse_dates=[1])
df.rolling(2).mean()
NotImplementedError: ops for Rolling for this dtype datetime64[ns] are not implemented
NotImplementedError: ops for Rolling for this dtype datetime64[ns] are not implemented
First off, this is confirmation of @BrenBarn's comment and he should get the credit if he decides to post an answer. BrenBarn, if you decide to answer, I'll delete this post.
首先,这是对@BrenBarn 评论的确认,如果他决定发布答案,他应该得到赞扬。BrenBarn,如果你决定回答,我会删除这篇文章。
Explanation
解释
Pandas has no idea what a rolling mean of date values ought to be. df.rolling(2).mean()
is attempting to roll and average over both the Time
and Value
columns. The error is politely (or impolitely, depending on your perspective) telling you that you're trying something non-sensical.
Pandas 不知道日期值的滚动平均值应该是什么。 df.rolling(2).mean()
正在尝试对Time
和Value
列进行滚动和平均。该错误是礼貌地(或不礼貌地,取决于您的观点)告诉您您正在尝试一些毫无意义的事情。
Solution
解决方案
Move the Time
column to the index and then... well that's it.
将Time
列移动到索引,然后......好吧,就是这样。
df.set_index('Time').rolling(2).mean()