Python Pandas 重采样错误:仅对 DatetimeIndex 或 PeriodIndex 有效
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/30857680/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas Resampling error: Only valid with DatetimeIndex or PeriodIndex
提问by Nyxynyx
When using panda's resample
function on a DataFrame in order to convert tick data to OHLCV, a resampling error is encountered.
在resample
DataFrame 上使用 panda 的函数以将刻度数据转换为 OHLCV 时,会遇到重采样错误。
How should we solve the error?
我们应该如何解决错误?
data = pd.read_csv('tickdata.csv', header=None, names=['Timestamp','Price','Volume']).set_index('Timestamp')
data.head()
# Resample data into 30min bins
ticks = data.ix[:, ['Price', 'Volume']]
bars = ticks.Price.resample('30min', how='ohlc')
volumes = ticks.Volume.resample('30min', how='sum')
This gives the error:
这给出了错误:
TypeError: Only valid with DatetimeIndex or PeriodIndex
采纳答案by unutbu
Convert the integer timestamps in the index to a DatetimeIndex:
将索引中的整数时间戳转换为 DatetimeIndex:
data.index = pd.to_datetime(data.index, unit='s')
This interprets the integers as seconds since the Epoch.
这将整数解释为自纪元以来的秒数。
For example, given
例如,给定
data = pd.DataFrame(
{'Timestamp':[1313331280, 1313334917, 1313334917, 1313340309, 1313340309],
'Price': [10.4]*3 + [10.5]*2, 'Volume': [0.779, 0.101, 0.316, 0.150, 1.8]})
data = data.set_index(['Timestamp'])
# Price Volume
# Timestamp
# 1313331280 10.4 0.779
# 1313334917 10.4 0.101
# 1313334917 10.4 0.316
# 1313340309 10.5 0.150
# 1313340309 10.5 1.800
data.index = pd.to_datetime(data.index, unit='s')
yields
产量
Price Volume
2011-08-14 14:14:40 10.4 0.779
2011-08-14 15:15:17 10.4 0.101
2011-08-14 15:15:17 10.4 0.316
2011-08-14 16:45:09 10.5 0.150
2011-08-14 16:45:09 10.5 1.800
Then
然后
ticks = data.ix[:, ['Price', 'Volume']]
bars = ticks.Price.resample('30min').ohlc()
volumes = ticks.Volume.resample('30min').sum()
can be computed:
可以计算:
In [368]: bars
Out[368]:
open high low close
2011-08-14 14:00:00 10.4 10.4 10.4 10.4
2011-08-14 14:30:00 NaN NaN NaN NaN
2011-08-14 15:00:00 10.4 10.4 10.4 10.4
2011-08-14 15:30:00 NaN NaN NaN NaN
2011-08-14 16:00:00 NaN NaN NaN NaN
2011-08-14 16:30:00 10.5 10.5 10.5 10.5
In [369]: volumes
Out[369]:
2011-08-14 14:00:00 0.779
2011-08-14 14:30:00 NaN
2011-08-14 15:00:00 0.417
2011-08-14 15:30:00 NaN
2011-08-14 16:00:00 NaN
2011-08-14 16:30:00 1.950
Freq: 30T, Name: Volume, dtype: float64