pandas 使用 fill_method 重新采样:需要知道从哪一行复制数据?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/13333159/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
pandas's resample with fill_method: Need to know data from which row was copied?
提问by pvncad
I am trying to use resample method to fill the gaps in timeseries data. But I also want to know which row was used to fill the missed data.
我正在尝试使用 resample 方法来填补时间序列数据中的空白。但我也想知道哪一行被用来填充遗漏的数据。
This is my input series.
这是我的输入系列。
In [28]: data
Out[28]:
Date
2002-09-09 233.25
2002-09-11 233.05
2002-09-16 230.25
2002-09-18 230.10
2002-09-19 230.05
Name: Price
With resample, I will get this
通过重新采样,我会得到这个
In [29]: data.resample("D", fill_method='bfill')
Out[29]:
Date
2002-09-09 233.25
2002-09-10 233.05
2002-09-11 233.05
2002-09-12 230.25
2002-09-13 230.25
2002-09-14 230.25
2002-09-15 230.25
2002-09-16 230.25
2002-09-17 230.10
2002-09-18 230.10
2002-09-19 230.05
Freq: D
I am looking for
我在寻找
Out[29]:
Date
2002-09-09 233.25 2002-09-09
2002-09-10 233.05 2012-09-11
2002-09-11 233.05 2012-09-11
2002-09-12 230.25 2012-09-16
2002-09-13 230.25 2012-09-16
2002-09-14 230.25 2012-09-16
2002-09-15 230.25 2012-09-16
2002-09-16 230.25 2012-09-16
2002-09-17 230.10 2012-09-18
2002-09-18 230.10 2012-09-18
2002-09-19 230.05 2012-09-19
Any help?
有什么帮助吗?
采纳答案by Garrett
After converting the Seriesto a DataFrame, copy the index into it's own column. (DatetimeIndex.format()is useful here as it returns a string representation of the index, rather than Timestamp/datetime objects.)
转换Series为 a 后DataFrame,将索引复制到它自己的列中。(DatetimeIndex.format()在这里很有用,因为它返回索引的字符串表示,而不是时间戳/日期时间对象。)
In [510]: df = pd.DataFrame(data)
In [511]: df['OrigDate'] = df.index.format()
In [513]: df
Out[513]:
Price OrigDate
Date
2002-09-09 233.25 2002-09-09
2002-09-11 233.05 2002-09-11
2002-09-16 230.25 2002-09-16
2002-09-18 230.10 2002-09-18
2002-09-19 230.05 2002-09-19
For resampling without aggregation, there is a helper method asfreq().
对于没有聚合的重采样,有一个辅助方法asfreq()。
In [528]: df.asfreq("D", method='bfill')
Out[528]:
Price OrigDate
2002-09-09 233.25 2002-09-09
2002-09-10 233.05 2002-09-11
2002-09-11 233.05 2002-09-11
2002-09-12 230.25 2002-09-16
2002-09-13 230.25 2002-09-16
2002-09-14 230.25 2002-09-16
2002-09-15 230.25 2002-09-16
2002-09-16 230.25 2002-09-16
2002-09-17 230.10 2002-09-18
2002-09-18 230.10 2002-09-18
2002-09-19 230.05 2002-09-19
This is effectively short-hand for the following, where last()is invoked on the intermediate DataFrameGroupByobjects.
这是以下内容的有效简写, wherelast()在中间DataFrameGroupBy对象上调用。
In [529]: df.resample("D", how='last', fill_method='bfill')
Out[529]:
Price OrigDate
Date
2002-09-09 233.25 2002-09-09
2002-09-10 233.05 2002-09-11
2002-09-11 233.05 2002-09-11
2002-09-12 230.25 2002-09-16
2002-09-13 230.25 2002-09-16
2002-09-14 230.25 2002-09-16
2002-09-15 230.25 2002-09-16
2002-09-16 230.25 2002-09-16
2002-09-17 230.10 2002-09-18
2002-09-18 230.10 2002-09-18
2002-09-19 230.05 2002-09-19

