Python Pandas 数据框 - 任何以编程方式设置频率的方法?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/27607974/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 22:46:58  来源:igfitidea点击:

Python pandas dataframe - any way to set frequency programmatically?

pythonpandas

提问by birone

I'm trying to process CSV files like this:

我正在尝试处理这样的 CSV 文件:

df = pd.read_csv("raw_hl.csv", index_col='time', parse_dates = True))
df.head(2)
                    high        low 
time                
2014-01-01 17:00:00 1.376235    1.375945
2014-01-01 17:01:00 1.376005    1.375775
2014-01-01 17:02:00 1.375795    1.375445
2014-01-01 17:07:00 NaN         NaN 
...
2014-01-01 17:49:00 1.375645    1.375445

type(df.index)
pandas.tseries.index.DatetimeIndex

But these don't automatically have a frequency:

但是这些不会自动具有频率:

print df.index.freq
None

In case they have differing frequencies, it would be handy to be able to set one automatically. The simplest way would be to compare the first two rows:

如果它们有不同的频率,能够自动设置一个会很方便。最简单的方法是比较前两行:

tdelta = df.index[1] - df.index[0]
tdelta
datetime.timedelta(0, 60) 

So far so good, but setting frequency directly to this timedelta fails:

到目前为止一切顺利,但将频率直接设置为此 timedelta 失败:

df.index.freq = tdelta
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-25-3f24abacf9de> in <module>()
----> 1 df.index.freq = tdelta

AttributeError: can't set attribute

Is there a way (ideally relatively painless!) to do this?

有没有办法(理想情况下相对无痛!)来做到这一点?

ANSWER: Pandas has given the dataframe has a index.inferred_freq attribute - perhaps to avoid overwriting a user defined frequency. df.index.inferred_freq = 'T'

回答:Pandas 已经给数据帧提供了一个 index.inferred_freq 属性 - 也许是为了避免覆盖用户定义的频率。df.index.inferred_freq = 'T'

So it just seems to be a matter of using this instead of df.index.freq. Thanks to Jeff, who also provides more details below :)

所以这似乎只是使用它而不是 df.index.freq 的问题。感谢杰夫,他还在下面提供了更多详细信息:)

回答by Jeff

If you havea regular frequency it will be reported when you look at df.index.freq

如果你规律的频率它会在你看的时候报告df.index.freq

In [20]: df = DataFrame({'A' : np.arange(5)},index=pd.date_range('20130101 09:00:00',freq='3T',periods=5))

In [21]: df
Out[21]: 
                     A
2013-01-01 09:00:00  0
2013-01-01 09:03:00  1
2013-01-01 09:06:00  2
2013-01-01 09:09:00  3
2013-01-01 09:12:00  4

In [22]: df.index.freq
Out[22]: <3 * Minutes>

Have an irregularfrequency will return None

有不规律的频率会回来None

In [23]: df.index = df.index[0:2].tolist() + [Timestamp('20130101 09:05:00')] + df.index[-2:].tolist()

In [24]: df
Out[24]: 
                     A
2013-01-01 09:00:00  0
2013-01-01 09:03:00  1
2013-01-01 09:05:00  2
2013-01-01 09:09:00  3
2013-01-01 09:12:00  4

In [25]: df.index.freq

You can recover a regular frequency by doing this. Downsampling to a lower freq (where you don't have overlapping values), forward filling, then reindexing to the desired frequency and end-points).

您可以通过这样做恢复正常频率。下采样到较低的频率(您没有重叠值),向前填充,然后重新索引到所需的频率和端点)。

In [31]: df.resample('T').ffill().reindex(pd.date_range(df.index[0],df.index[-1],freq='3T'))
Out[31]: 
                     A
2013-01-01 09:00:00  0
2013-01-01 09:03:00  1
2013-01-01 09:06:00  2
2013-01-01 09:09:00  3
2013-01-01 09:12:00  4