pandas 从逐笔报价数据到烛台
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/12322869/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
From tick by tick data to candlestick
提问by Femto Trader
I've tick by tick data for Forex pairs
我已经为外汇对逐笔报价
Here is a sample of EURUSD/EURUSD-2012-06.csv
这是一个示例 EURUSD/EURUSD-2012-06.csv
EUR/USD,20120601 00:00:00.207,1.23618,1.2363
EUR/USD,20120601 00:00:00.209,1.23618,1.23631
EUR/USD,20120601 00:00:00.210,1.23618,1.23631
EUR/USD,20120601 00:00:00.211,1.23623,1.23631
EUR/USD,20120601 00:00:00.240,1.23623,1.23627
EUR/USD,20120601 00:00:00.423,1.23622,1.23627
EUR/USD,20120601 00:00:00.457,1.2362,1.23626
EUR/USD,20120601 00:00:01.537,1.2362,1.23625
EUR/USD,20120601 00:00:03.010,1.2362,1.23624
EUR/USD,20120601 00:00:03.012,1.2362,1.23625
Full tick data can be downloaded here http://dl.free.fr/k4vVF7aOD
完整的刻度数据可以在这里下载 http://dl.free.fr/k4vVF7aOD
Columns are :
列是:
Symbol,Datetime,Bid,Ask
I would like to convert this tick by tick data to candlestick data (also called OHLC Open High Low Close) I will say that I want to get a M15 timeframe (15 minutes) as an example
我想将这个逐笔报价数据转换为烛台数据(也称为 OHLC Open High Low Close) 我会说我想以 M15 时间范围(15 分钟)为例
I would like to use Python and Pandas library to achieve this task.
我想使用 Python 和 Pandas 库来完成这项任务。
I've done a little part of the job... reading the tick by tick data file
我已经完成了工作的一小部分......逐笔读取数据文件
Here is the code
这是代码
#!/usr/bin/env python
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from matplotlib.finance import candlestick
from datetime import *
def conv_str_to_datetime(x):
return(datetime.strptime(x, '%Y%m%d %H:%M:%S.%f'))
df = pd.read_csv('test_EURUSD/EURUSD-2012-07.csv', names=['Symbol', 'Date_Time', 'Bid', 'Ask'], converters={'Date_Time': conv_str_to_datetime})
PipPosition = 4
df['Spread'] = (df['Ask'] - df['Bid']) * 10**PipPosition
print(df)
print("="*10)
print(df.ix[0])
but now I don't know how to start rest of the job...
但现在我不知道如何开始剩下的工作......
I want to get data like
我想获得类似的数据
Symbol,Datetime_open_candle,open_price,high_price,low_price,close_price
Price on candle will be based on Bid column.
蜡烛价格将基于出价栏。
The first part of the problem is in my mind to get the first Datetime_open_candle (compatible with the desired timeframe, lets say that the name of the variable is dt1) and the last Datetime_open_candle (let's say that the name of this variable is dt2).
问题的第一部分是在我的脑海中获得第一个 Datetime_open_candle(与所需的时间范围兼容,假设变量的名称为 dt1)和最后一个 Datetime_open_candle(假设该变量的名称为 dt2)。
After I will probably need to get data from dt1 to dt2 (and not data before dt1 and after dt2)
在我可能需要从 dt1 到 dt2 获取数据之后(而不是 dt1 之前和 dt2 之后的数据)
Knowing dt1 and dt2 and desired timeframe I can know the number of candles I will have...
知道 dt1 和 dt2 以及所需的时间范围,我可以知道我将拥有的蜡烛数量...
I've "just to" know, for each candle, what is open/high/low/close price.
我“只是想”知道,对于每根蜡烛,什么是开盘价/最高价/最低价/收盘价。
I'm looking for a quite fast algorithm, if possible a vectorized one (if it's possible) as tick data can be very big.
我正在寻找一种非常快的算法,如果可能的话,向量化算法(如果可能的话),因为刻度数据可能非常大。
回答by Wouter Overmeire
In [59]: df
Out[59]:
Symbol Bid Ask
Datetime
2012-06-01 00:00:00.207000 EUR/USD 1.23618 1.23630
2012-06-01 00:00:00.209000 EUR/USD 1.23618 1.23631
2012-06-01 00:00:00.210000 EUR/USD 1.23618 1.23631
2012-06-01 00:00:00.211000 EUR/USD 1.23623 1.23631
2012-06-01 00:00:00.240000 EUR/USD 1.23623 1.23627
2012-06-01 00:00:00.423000 EUR/USD 1.23622 1.23627
2012-06-01 00:00:00.457000 EUR/USD 1.23620 1.23626
2012-06-01 00:00:01.537000 EUR/USD 1.23620 1.23625
2012-06-01 00:00:03.010000 EUR/USD 1.23620 1.23624
2012-06-01 00:00:03.012000 EUR/USD 1.23620 1.23625
In [60]: grouped = df.groupby('Symbol')
In [61]: ask = grouped['Ask'].resample('15Min', how='ohlc')
In [62]: bid = grouped['Bid'].resample('15Min', how='ohlc')
In [63]: pandas.concat([ask, bid], axis=1, keys=['Ask', 'Bid'])
Out[63]:
Ask Bid
open high low close open high low close
Symbol Datetime
EUR/USD 2012-06-01 00:15:00 1.2363 1.23631 1.23624 1.23625 1.23618 1.23623 1.23618 1.2362
回答by Sean Stayns
The syntax in the answer from Overmeire is meanwhile deprecated.
同时不推荐使用来自 Overmeire 的答案中的语法。
Instead of this:
取而代之的是:
ask = grouped['Ask'].resample('15Min', how='ohlc')
bid = grouped['Bid'].resample('15Min', how='ohlc')
Use this:
用这个:
ask = grouped['Ask'].resample('15Min').ohlc()
bid = grouped['Bid'].resample('15Min').ohlc()

