Python Pandas 数据框 - 任何以编程方式设置频率的方法?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/27607974/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python pandas dataframe - any way to set frequency programmatically?
提问by birone
I'm trying to process CSV files like this:
我正在尝试处理这样的 CSV 文件:
df = pd.read_csv("raw_hl.csv", index_col='time', parse_dates = True))
df.head(2)
high low
time
2014-01-01 17:00:00 1.376235 1.375945
2014-01-01 17:01:00 1.376005 1.375775
2014-01-01 17:02:00 1.375795 1.375445
2014-01-01 17:07:00 NaN NaN
...
2014-01-01 17:49:00 1.375645 1.375445
type(df.index)
pandas.tseries.index.DatetimeIndex
But these don't automatically have a frequency:
但是这些不会自动具有频率:
print df.index.freq
None
In case they have differing frequencies, it would be handy to be able to set one automatically. The simplest way would be to compare the first two rows:
如果它们有不同的频率,能够自动设置一个会很方便。最简单的方法是比较前两行:
tdelta = df.index[1] - df.index[0]
tdelta
datetime.timedelta(0, 60)
So far so good, but setting frequency directly to this timedelta fails:
到目前为止一切顺利,但将频率直接设置为此 timedelta 失败:
df.index.freq = tdelta
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-25-3f24abacf9de> in <module>()
----> 1 df.index.freq = tdelta
AttributeError: can't set attribute
Is there a way (ideally relatively painless!) to do this?
有没有办法(理想情况下相对无痛!)来做到这一点?
ANSWER: Pandas has given the dataframe has a index.inferred_freq attribute - perhaps to avoid overwriting a user defined frequency. df.index.inferred_freq = 'T'
回答:Pandas 已经给数据帧提供了一个 index.inferred_freq 属性 - 也许是为了避免覆盖用户定义的频率。df.index.inferred_freq = 'T'
So it just seems to be a matter of using this instead of df.index.freq. Thanks to Jeff, who also provides more details below :)
所以这似乎只是使用它而不是 df.index.freq 的问题。感谢杰夫,他还在下面提供了更多详细信息:)
回答by Jeff
If you havea regular frequency it will be reported when you look at df.index.freq
如果你有规律的频率它会在你看的时候报告df.index.freq
In [20]: df = DataFrame({'A' : np.arange(5)},index=pd.date_range('20130101 09:00:00',freq='3T',periods=5))
In [21]: df
Out[21]:
A
2013-01-01 09:00:00 0
2013-01-01 09:03:00 1
2013-01-01 09:06:00 2
2013-01-01 09:09:00 3
2013-01-01 09:12:00 4
In [22]: df.index.freq
Out[22]: <3 * Minutes>
Have an irregularfrequency will return None
有不规律的频率会回来None
In [23]: df.index = df.index[0:2].tolist() + [Timestamp('20130101 09:05:00')] + df.index[-2:].tolist()
In [24]: df
Out[24]:
A
2013-01-01 09:00:00 0
2013-01-01 09:03:00 1
2013-01-01 09:05:00 2
2013-01-01 09:09:00 3
2013-01-01 09:12:00 4
In [25]: df.index.freq
You can recover a regular frequency by doing this. Downsampling to a lower freq (where you don't have overlapping values), forward filling, then reindexing to the desired frequency and end-points).
您可以通过这样做恢复正常频率。下采样到较低的频率(您没有重叠值),向前填充,然后重新索引到所需的频率和端点)。
In [31]: df.resample('T').ffill().reindex(pd.date_range(df.index[0],df.index[-1],freq='3T'))
Out[31]:
A
2013-01-01 09:00:00 0
2013-01-01 09:03:00 1
2013-01-01 09:06:00 2
2013-01-01 09:09:00 3
2013-01-01 09:12:00 4

