pandas 使用熊猫将贸易数据重新采样为 OHLCV

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/21140630/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 21:34:13  来源:igfitidea点击:

Resampling trade data into OHLCV with pandas

pythonpandas

提问by jespern

I have historical trade data in a pandas DataFrame, containing price and volume columns, indexed by a DateTimeIndex.

我在 Pandas DataFrame 中有历史交易数据,包含价格和交易量列,由 DateTimeIndex 索引。

For example:

例如:

>>> print df.tail()
                             price  volume
2014-01-15 14:29:54+00:00  949.975    0.01
2014-01-15 14:29:59+00:00  941.370    0.01
2014-01-15 14:30:17+00:00  949.975    0.01
2014-01-15 14:30:24+00:00  941.370    0.01
2014-01-15 14:30:36+00:00  949.975    0.01

Now, I can resample this into OHLC data using df.resample(freq, how={'price': 'ohlc'}), which is fine, but I'd also like to include the volume.

现在,我可以使用 将df.resample(freq, how={'price': 'ohlc'})其重新采样为 OHLC 数据,这很好,但我也想包括体积。

When I try df.resample(freq, how={'price': 'ohlc', 'volume': 'sum'}), I get:

当我尝试时df.resample(freq, how={'price': 'ohlc', 'volume': 'sum'}),我得到:

ValueError: Shape of passed values is (2,), indices imply (2, 95)

ValueError: Shape of passed values is (2,), indices imply (2, 95)

I'm not quite sure what is wrong with my dataset, or why this fails. Could anyone help shed some light on this? Much appreciated.

我不太确定我的数据集有什么问题,或者为什么会失败。任何人都可以帮助阐明这一点吗?非常感激。

回答by TomAugspurger

The problem isn't with the resampling, it's from trying to concat a MultiIndex (from the price OHLC), with a regular index (for the Volume sum).

问题不在于重采样,而是试图将 MultiIndex(来自价格 OHLC)与常规索引(用于 Volume sum)连接起来。

In [17]: df
Out[17]: 
                       price  volume
2014-01-15 14:29:54  949.975    0.01
2014-01-15 14:29:59  941.370    0.01
2014-01-15 14:30:17  949.975    0.01
2014-01-15 14:30:24  941.370    0.01
2014-01-15 14:30:36  949.975    0.01

[5 rows x 2 columns]

In [18]: df.resample('30s', how={'price': 'ohlc'})  # Note the MultiIndex
Out[18]: 
                       price                           
                        open     high      low    close
2014-01-15 14:29:30  949.975  949.975  941.370  941.370
2014-01-15 14:30:00  949.975  949.975  941.370  941.370
2014-01-15 14:30:30  949.975  949.975  949.975  949.975

[3 rows x 4 columns]

In [19]: df.resample('30s', how={'volume': 'sum'})  # Regular Index for columns
Out[19]: 
                     volume
2014-01-15 14:29:30    0.02
2014-01-15 14:30:00    0.02
2014-01-15 14:30:30    0.01

[3 rows x 1 columns]

I guess you could manually create a MultiIndex for (volume, sum)and then concat:

我想你可以手动创建一个 MultiIndex(volume, sum)然后连接:

In [34]: vol = df.resample('30s', how={'volume': 'sum'})

In [35]: vol.columns = pd.MultiIndex.from_tuples([('volume', 'sum')])

In [36]: vol
Out[36]: 
                     volume
                        sum
2014-01-15 14:29:30    0.02
2014-01-15 14:30:00    0.02
2014-01-15 14:30:30    0.01

[3 rows x 1 columns]

In [37]: price = df.resample('30s', how={'price': 'ohlc'})

In [38]: pd.concat([price, vol], axis=1)
Out[38]: 
                       price                             volume
                        open     high      low    close     sum
2014-01-15 14:29:30  949.975  949.975  941.370  941.370    0.02
2014-01-15 14:30:00  949.975  949.975  941.370  941.370    0.02
2014-01-15 14:30:30  949.975  949.975  949.975  949.975    0.01

[3 rows x 5 columns]

But it might be better if resample could handle this automatically.

但是如果 resample 可以自动处理这个可能会更好。

回答by campervancoder

You can now do this in later versions of Pandas Example: Pandas version 0.22.00 df.resample('30S').mean()

您现在可以在 Pandas 的更高版本中执行此操作示例:Pandas 版本 0.22.00df.resample('30S').mean()