Python OHLC 数据上的 Pandas OHLC 聚合
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/36222928/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas OHLC aggregation on OHLC data
提问by user3439187
I understand that OHLC re-sampling of time series data in Pandas, using one column of data, will work perfectly, for example on the following dataframe:
我知道 OHLC 使用一列数据对 Pandas 中的时间序列数据进行重新采样将完美运行,例如在以下数据帧上:
>>df
ctime openbid
1443654000 1.11700
1443654060 1.11700
...
df['ctime'] = pd.to_datetime(df['ctime'], unit='s')
df = df.set_index('ctime')
df.resample('1H', how='ohlc', axis=0, fill_method='bfill')
>>>
open high low close
ctime
2015-09-30 23:00:00 1.11700 1.11700 1.11687 1.11697
2015-09-30 24:00:00 1.11700 1.11712 1.11697 1.11697
...
But what do I do if the data is already in an OHLC format? From what I can gather the OHLC method of the API calculates an OHLC slice for every column, hence if my data is in the format:
但是如果数据已经是 OHLC 格式,我该怎么办?从我可以收集到的 API 的 OHLC 方法计算每一列的 OHLC 切片,因此如果我的数据采用以下格式:
ctime openbid highbid lowbid closebid
0 1443654000 1.11700 1.11700 1.11687 1.11697
1 1443654060 1.11700 1.11712 1.11697 1.11697
2 1443654120 1.11701 1.11708 1.11699 1.11708
When I try to re-sample I get an OHLC for each of the columns, like so:
当我尝试重新采样时,每列都会得到一个 OHLC,如下所示:
openbid highbid \
open high low close open high
ctime
2015-09-30 23:00:00 1.11700 1.11700 1.11700 1.11700 1.11700 1.11712
2015-09-30 23:01:00 1.11701 1.11701 1.11701 1.11701 1.11708 1.11708
...
lowbid \
low close open high low close
ctime
2015-09-30 23:00:00 1.11700 1.11712 1.11687 1.11697 1.11687 1.11697
2015-09-30 23:01:00 1.11708 1.11708 1.11699 1.11699 1.11699 1.11699
...
closebid
open high low close
ctime
2015-09-30 23:00:00 1.11697 1.11697 1.11697 1.11697
2015-09-30 23:01:00 1.11708 1.11708 1.11708 1.11708
Is there a quick(ish) workaround for this that someone is willing to share please, without me having to get knee-deep in pandas manual?
是否有一个快速(ish)的解决方法,有人愿意分享,而我不必深入了解熊猫手册?
Thanks.
谢谢。
ps, there is this answer - Converting OHLC stock data into a different timeframe with python and pandas- but it was 4 years ago, so I am hoping there has been some progress.
ps,有这个答案 -使用 python 和 pandas 将 OHLC 股票数据转换为不同的时间范围- 但那是 4 年前,所以我希望已经取得了一些进展。
回答by chrisb
This is similar to the answer you linked, but it a little cleaner, and faster, because it uses the optimized aggregations, rather than lambdas.
这类似于您链接的答案,但它更简洁、更快,因为它使用优化的聚合,而不是 lambda。
Note that the resample(...).agg(...)
syntax requires pandas version 0.18.0
.
请注意,resample(...).agg(...)
语法需要 pandas version 0.18.0
。
In [101]: df.resample('1H').agg({'openbid': 'first',
'highbid': 'max',
'lowbid': 'min',
'closebid': 'last'})
Out[101]:
lowbid highbid closebid openbid
ctime
2015-09-30 23:00:00 1.11687 1.11712 1.11708 1.117
回答by Benjamin Crouzier
You need to use an OrderedDict to keep row order in the newer versions of pandas, like so:
您需要使用 OrderedDict 在较新版本的熊猫中保持行顺序,如下所示:
import pandas as pd
from collections import OrderedDict
df['ctime'] = pd.to_datetime(df['ctime'], unit='s')
df = df.set_index('ctime')
df = df.resample('5Min').agg(
OrderedDict([
('open', 'first'),
('high', 'max'),
('low', 'min'),
('close', 'last'),
('volume', 'sum'),
])
)
回答by Ben
Given a dataframe with price and amount columns
给定一个带有价格和金额列的数据框
def agg_ohlcv(x):
arr = x['price'].values
names = {
'low': min(arr) if len(arr) > 0 else np.nan,
'high': max(arr) if len(arr) > 0 else np.nan,
'open': arr[0] if len(arr) > 0 else np.nan,
'close': arr[-1] if len(arr) > 0 else np.nan,
'volume': sum(x['amount'].values) if len(x['amount'].values) > 0 else 0,
}
return pd.Series(names)
df = df.resample('1min').apply(agg_ohlcv)
df = df.ffill()
回答by Sivakumar D
This one seems to work,
这个好像可以用
def ohlcVolume(x):
if len(x):
ohlc={ "open":x["open"][0],"high":max(x["high"]),"low":min(x["low"]),"close":x["close"][-1],"volume":sum(x["volume"])}
return pd.Series(ohlc)
daily=df.resample('1D').apply(ohlcVolume)
回答by Datalker
Converstion from OHLC to OHLC for me worked like this:
对我来说,从 OHLC 到 OHLC 的转换是这样工作的:
df.resample('1H').agg({
'openbid':'first',
'highbid':'max',
'lowbid':'min',
'closebid':'last'
})