pandas 在 Python 中计算复合收益系列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/5515021/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Compute a compounded return series in Python
提问by Jason Strimpel
Greetings all, I have two series of data: daily raw stock price returns (positive or negative floats) and trade signals (buy=1, sell=-1, no trade=0).
大家好,我有两个系列的数据:每日原始股票价格回报(正或负浮动)和交易信号(买入=1,卖出=-1,无交易=0)。
The raw price returns are simply the log of today's price divided by yesterday's price:
原始价格回报只是今天价格除以昨天价格的对数:
log(p_today / p_yesterday)
An example:
一个例子:
raw_return_series = [ 0.0063 -0.0031 0.0024 ..., -0.0221 0.0097 -0.0015]
The trade signal series looks like this:
交易信号系列如下所示:
signal_series = [-1. 0. -1. -1. 0. 0. -1. 0. 0. 0.]
To get the daily returns based on the trade signals:
要根据交易信号获得每日回报:
daily_returns = [raw_return_series[i] * signal_series[i+1] for i in range(0, len(signal_series)-1)]
These daily returns might look like this:
这些每日回报可能如下所示:
[0.0, 0.00316, -0.0024, 0.0, 0.0, 0.0023, 0.0, 0.0, 0.0] # results in daily_returns; notice the 0s
I need to use the daily_returns series to compute a compounded returns series. However, given that there are 0 values in the daily_returns series, I need to carry over the last non-zero compound return "through time" to the next non-zero compound return.
我需要使用 daily_returns 系列来计算复合收益系列。但是,考虑到 daily_returns 系列中有 0 个值,我需要将最后一个非零复合回报“通过时间”结转到下一个非零复合回报。
For example, I compute the compound returns like this (notice I am going "backwards" through time):
例如,我像这样计算复合回报(注意我将“倒退”时间):
compound_returns = [(((1 + compounded[i + 1]) * (1 + daily_returns[i])) - 1) for i in range(len(compounded) - 2, -1, -1)]
and the resulting list:
和结果列表:
[0.0, 0.0, 0.0023, 0.0, 0.0, -0.0024, 0.0031, 0.0] # (notice the 0s)
My goal is to carry over the last non-zero return to the accumulate these compound returns. That is, since the return at index i is dependent on the return at index i+1, the return at index i+1 should be non-zero. Every time the list comprehension encounters a zero in the daily_return series, it essentially restarts.
我的目标是将最后一个非零回报结转至累积这些复合回报。也就是说,由于索引 i 处的回报取决于索引 i+1 处的回报,因此索引 i+1 处的回报应该是非零的。每次列表推导式在 daily_return 系列中遇到零时,它基本上都会重新启动。
采纳答案by Mike Pennington
There is a fantastic module called pandasthat was written by a guy at AQR (a hedge fund) that excels at calculations like this... what you need is a way to handle "missing data"... as someone mentioned above, the basics are using the nan (not a number) capabilities of scipy or numpy; however, even those libraries don't make financial calculations that much easier... if you use pandas, you can mark the data you don't want to consider as nan, and then any future calculations will reject it, while performing normal operations on other data.
有一个叫做Pandas的奇妙模块,它是由 AQR(一家对冲基金)的一个人编写的,它擅长这样的计算......你需要的是一种处理“缺失数据”的方法......正如上面提到的那样,基础是使用 scipy 或 numpy 的 nan(不是数字)功能;然而,即使是那些库也不会让财务计算变得那么容易......如果你使用熊猫,你可以将你不想考虑的数据标记为nan,然后任何未来的计算都会拒绝它,同时对它执行正常操作其他数据。
I have been using pandason my trading platform for about 8 months... I wish I had started using it sooner.
我已经在我的交易平台上使用pandas大约 8 个月了……我希望我早点开始使用它。
Wes (the author) gave a talk at pyCon 2010 about the capabilities of the module... see the slides and video on the pyCon 2010 webpage. In that video, he demonstrates how to get daily returns, run 1000s of linear regressions on a matrix of returns (in a fraction of a second), timestamp / graph data... all done with this module. Combined with psyco, this is a beast of a financial analysis tool.
Wes(作者)在 pyCon 2010 上发表了关于模块功能的演讲……请参阅pyCon 2010 网页上的幻灯片和视频。在该视频中,他演示了如何获得每日回报、在回报矩阵上运行 1000 次线性回归(在几分之一秒内)、时间戳/图形数据......所有这些都用这个模块完成。结合psyco,这是一个财务分析工具的野兽。
The other great thing it handles is cross-sectional data... so you could grab daily close prices, their rolling means, etc... then timestamp everycalculation, and get all this stored in something similar to a python dictionary (see the pandas.DataFrameclass)... then you access slices of the data as simply as:
它处理的另一件很棒的事情是横截面数据......所以你可以获取每日收盘价,它们的滚动方式等......然后为每次计算添加时间戳,并将所有这些存储在类似于python字典的东西中(参见pandas.DataFrame类)...然后您可以简单地访问数据切片:
close_prices['stdev_5d']
See the pandas rolling moments docfor more information on to calculate the rolling stdev (it's a one-liner).
有关计算滚动标准差的更多信息,请参阅大熊猫滚动力矩文档(它是单行的)。
Wes has gone out of his way to speed the module up with cython, although I'll concede that I'm considering upgrading my server (an older Xeon), due to my analysis requirements.
Wes 已经竭尽全力使用 cython 加速模块,尽管我承认由于我的分析要求,我正在考虑升级我的服务器(较旧的 Xeon)。
EDIT FOR STRIMP's QUESTION:After you converted your code to use pandas data structures, it's still unclear to me how you're indexing your data in a pandas dataframe and the compounding function's requirements for handling missing data (or for that matter the conditions for a 0.0 return... or if you are using NaNin pandas..). I will demonstrate using my data indexing... a day was picked at random... dfis a dataframe with ES Futures quotes in it... indexed per second... missing quotes are filled in with numpy.nan. DataFrame indexes are datetimeobjects, offset by the pytzmodule's timezone objects.
编辑 STRIMP 的问题:在您将代码转换为使用Pandas数据结构后,我仍然不清楚您如何在Pandas 数据框中索引数据以及复合函数处理缺失数据的要求(或者就此而言, 0.0 返回...或者如果您NaN在熊猫中使用...)。我将演示使用我的数据索引...随机选择一天...df是一个包含 ES Futures 报价的数据框...每秒索引...缺失的引号用numpy.nan. DataFrame 索引是datetime对象,由pytz模块的时区对象偏移。
>>> df.info
<bound method DataFrame.info of <class 'pandas.core.frame.DataFrame'>
Index: 86400 entries , 2011-03-21 00:00:00-04:00 to 2011-03-21 23:59:59-04:00
etf 18390 non-null values
etfvol 18390 non-null values
fut 29446 non-null values
futvol 23446 non-null values
...
>>> # ET is a pytz object...
>>> et
<DstTzInfo 'US/Eastern' EST-1 day, 19:00:00 STD>
>>> # To get the futures quote at 9:45, eastern time...
>>> df.xs(et.localize(dt.datetime(2011,3,21,9,45,0)))['fut']
1291.75
>>>
To give a simple example of how to calculate a column of continuous returns (in a pandas.TimeSeries), which reference the quote 10 minutes ago (and filling in for missing ticks), I would do this:
举一个简单的例子来说明如何计算一列连续回报(在 a 中pandas.TimeSeries),它引用 10 分钟前的报价(并填写缺失的刻度),我会这样做:
>>> df['fut'].fill(method='pad')/df['fut'].fill(method='pad').shift(600)
No lambda is required in that case, just dividing the column of values by itself 600 seconds ago. That .shift(600)part is because my data is indexed per-second.
在这种情况下不需要 lambda,只需将 600 秒前的值列除以它自己。那.shift(600)部分是因为我的数据是每秒索引的。
HTH, \mike
HTH,\迈克
回答by Carl
The cumulative return part of this question is dealt with in Wes McKinney's excellent 'Python for Data Analysis' book on page 339, and uses cumprod() from Pandas to create a rebased/indexed cumulative return from calculated price changes.
这个问题的累积回报部分在 Wes McKinney 出色的“Python 数据分析”一书第 339 页中得到了处理,并使用 Pandas 中的 cumprod() 从计算出的价格变化中创建重新计算/索引的累积回报。
Example from book:
书中的例子:
import pandas.io.data as web
price = web.get_data_yahoo('AAPL', '2011-01-01')['Adj Close']
returns = price.pct_change()
ret_index = (1 + returns).cumprod()
ret_index[0] = 1 # Set first value to 1
回答by Jason Strimpel
imagine I have a DataMatrix with closing prices, some indicator value, and a trade signal like this:
想象一下,我有一个包含收盘价、一些指标值和如下交易信号的 DataMatrix:
>>> data_matrix
close dvi signal
2008-01-02 00:00:00 144.9 0.6504 -1
2008-01-03 00:00:00 144.9 0.6603 -1
2008-01-04 00:00:00 141.3 0.7528 -1
2008-01-07 00:00:00 141.2 0.8226 -1
2008-01-08 00:00:00 138.9 0.8548 -1
2008-01-09 00:00:00 140.4 0.8552 -1
2008-01-10 00:00:00 141.3 0.846 -1
2008-01-11 00:00:00 140.2 0.7988 -1
2008-01-14 00:00:00 141.3 0.6151 -1
2008-01-15 00:00:00 138.2 0.3714 1
I use the signal to create a DataMatrix of returns based on the trade signal:
我使用该信号根据交易信号创建一个返回数据矩阵:
>>> get_indicator_returns()
indicator_returns
2008-01-02 00:00:00 NaN
2008-01-03 00:00:00 0.000483
2008-01-04 00:00:00 0.02451
2008-01-07 00:00:00 0.0008492
2008-01-08 00:00:00 0.01615
2008-01-09 00:00:00 -0.01051
2008-01-10 00:00:00 -0.006554
2008-01-11 00:00:00 0.008069
2008-01-14 00:00:00 -0.008063
2008-01-15 00:00:00 0.02201
What I ended up doing is this:
我最终做的是这样的:
def get_compounded_indicator_cumulative(self):
indicator_dm = self.get_indicator_returns()
dates = indicator_dm.index
indicator_returns = indicator_dm['indicator_returns']
compounded = array(zeros(size(indicator_returns)))
compounded[1] = indicator_returns[1]
for i in range(2, len(indicator_returns)):
compounded[i] = (1 + compounded[i-1]) * (1 + indicator_returns[i]) - 1
data = {
'compounded_returns': compounded
}
return DataMatrix(data, index=dates)
For some reason I really struggled with this one...
出于某种原因,我真的很纠结这个……
I'm in the process of converting all my price series to PyTables. Looks promising so far.
我正在将所有价格系列转换为 PyTables。到目前为止看起来很有希望。

