Python OHLC 数据上的 Pandas OHLC 聚合

Question

提问by user3439187

I understand that OHLC re-sampling of time series data in Pandas, using one column of data, will work perfectly, for example on the following dataframe:

我知道 OHLC 使用一列数据对 Pandas 中的时间序列数据进行重新采样将完美运行，例如在以下数据帧上：

>>df
ctime       openbid
1443654000  1.11700
1443654060  1.11700
...

df['ctime']  = pd.to_datetime(df['ctime'], unit='s')
df           = df.set_index('ctime')
df.resample('1H',  how='ohlc', axis=0, fill_method='bfill')


>>>
                     open     high     low       close
ctime                                                   
2015-09-30 23:00:00  1.11700  1.11700  1.11687   1.11697
2015-09-30 24:00:00  1.11700  1.11712  1.11697   1.11697
...

But what do I do if the data is already in an OHLC format? From what I can gather the OHLC method of the API calculates an OHLC slice for every column, hence if my data is in the format:

但是如果数据已经是 OHLC 格式，我该怎么办？从我可以收集到的 API 的 OHLC 方法计算每一列的 OHLC 切片，因此如果我的数据采用以下格式：

             ctime  openbid  highbid   lowbid  closebid
0       1443654000  1.11700  1.11700  1.11687   1.11697
1       1443654060  1.11700  1.11712  1.11697   1.11697
2       1443654120  1.11701  1.11708  1.11699   1.11708

When I try to re-sample I get an OHLC for each of the columns, like so:

当我尝试重新采样时，每列都会得到一个 OHLC，如下所示：

                     openbid                             highbid           \
                        open     high      low    close     open     high   
ctime                                                                       
2015-09-30 23:00:00  1.11700  1.11700  1.11700  1.11700  1.11700  1.11712   
2015-09-30 23:01:00  1.11701  1.11701  1.11701  1.11701  1.11708  1.11708 
...
                                        lowbid                             \
                         low    close     open     high      low    close   
ctime                                                                       
2015-09-30 23:00:00  1.11700  1.11712  1.11687  1.11697  1.11687  1.11697   
2015-09-30 23:01:00  1.11708  1.11708  1.11699  1.11699  1.11699  1.11699  
...

                    closebid                             
                        open     high      low    close  
ctime                                                    
2015-09-30 23:00:00  1.11697  1.11697  1.11697  1.11697  
2015-09-30 23:01:00  1.11708  1.11708  1.11708  1.11708

Is there a quick(ish) workaround for this that someone is willing to share please, without me having to get knee-deep in pandas manual?

是否有一个快速（ish）的解决方法，有人愿意分享，而我不必深入了解熊猫手册？

Thanks.

谢谢。

ps, there is this answer - Converting OHLC stock data into a different timeframe with python and pandas- but it was 4 years ago, so I am hoping there has been some progress.

ps，有这个答案 -使用 python 和 pandas 将 OHLC 股票数据转换为不同的时间范围- 但那是 4 年前，所以我希望已经取得了一些进展。

Answer 1

回答by chrisb

This is similar to the answer you linked, but it a little cleaner, and faster, because it uses the optimized aggregations, rather than lambdas.

这类似于您链接的答案，但它更简洁、更快，因为它使用优化的聚合，而不是 lambda。

Note that the resample(...).agg(...)syntax requires pandas version 0.18.0.

请注意，resample(...).agg(...)语法需要 pandas version 0.18.0。

In [101]: df.resample('1H').agg({'openbid': 'first', 
                                 'highbid': 'max', 
                                 'lowbid': 'min', 
                                 'closebid': 'last'})
Out[101]: 
                      lowbid  highbid  closebid  openbid
ctime                                                   
2015-09-30 23:00:00  1.11687  1.11712   1.11708    1.117

Answer 2

回答by Benjamin Crouzier

You need to use an OrderedDict to keep row order in the newer versions of pandas, like so:

您需要使用 OrderedDict 在较新版本的熊猫中保持行顺序，如下所示：

import pandas as pd
from collections import OrderedDict

df['ctime'] = pd.to_datetime(df['ctime'], unit='s')
df = df.set_index('ctime')
df = df.resample('5Min').agg(
    OrderedDict([
        ('open', 'first'),
        ('high', 'max'),
        ('low', 'min'),
        ('close', 'last'),
        ('volume', 'sum'),
    ])
)

Answer 3

回答by Ben

Given a dataframe with price and amount columns

给定一个带有价格和金额列的数据框

def agg_ohlcv(x):
    arr = x['price'].values
    names = {
        'low': min(arr) if len(arr) > 0 else np.nan,
        'high': max(arr) if len(arr) > 0 else np.nan,
        'open': arr[0] if len(arr) > 0 else np.nan,
        'close': arr[-1] if len(arr) > 0 else np.nan,
        'volume': sum(x['amount'].values) if len(x['amount'].values) > 0 else 0,
    }
    return pd.Series(names)

df = df.resample('1min').apply(agg_ohlcv)
df = df.ffill()

Answer 4

回答by Sivakumar D

This one seems to work,

这个好像可以用

def ohlcVolume(x):
    if len(x):
        ohlc={ "open":x["open"][0],"high":max(x["high"]),"low":min(x["low"]),"close":x["close"][-1],"volume":sum(x["volume"])}
        return pd.Series(ohlc)

daily=df.resample('1D').apply(ohlcVolume)

Answer 5

回答by Datalker

Converstion from OHLC to OHLC for me worked like this:

对我来说，从 OHLC 到 OHLC 的转换是这样工作的：

df.resample('1H').agg({
    'openbid':'first',
    'highbid':'max',
    'lowbid':'min',
    'closebid':'last'
})

Python OHLC 数据上的 Pandas OHLC 聚合

提问by user3439187

回答by chrisb

回答by Benjamin Crouzier

回答by Ben

回答by Sivakumar D

回答by Datalker

相关推荐

最近更新

标签

Python OHLC 数据上的 Pandas OHLC 聚合

提问by user3439187

回答by chrisb

回答by Benjamin Crouzier

回答by Ben

回答by Sivakumar D

回答by Datalker

相关推荐

Python 使用熊猫读取csv时设置列类型

Python 使用多个布尔列过滤熊猫数据框

JSON.stringify (Javascript) 和 json.dumps (Python) 在列表中不等价？

Python 如何按索引值从 Pandas DataFrame 中检索行？

相关推荐

最近更新

标签