Python 使用 Pandas DataFrame 计算每日回报
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/20000726/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Calculate Daily Returns with Pandas DataFrame
提问by Michael
Here is my Pandas data frame:
这是我的 Pandas 数据框:
prices = pandas.DataFrame([1035.23, 1032.47, 1011.78, 1010.59, 1016.03, 1007.95,
1022.75, 1021.52, 1026.11, 1027.04, 1030.58, 1030.42,
1036.24, 1015.00, 1015.20])
Here is my daily_returnfunction:
这是我的daily_return功能:
def daily_return(prices):
return prices[:-1] / prices[1:] - 1
Here is output that comes from this function:
这是来自此函数的输出:
0 NaN
1 0
2 0
3 0
4 0
5 0
6 0
7 0
8 0
9 0
10 0
11 0
12 0
13 0
14 NaN
Why am I having this output?
为什么我有这个输出?
采纳答案by HYRY
Because operations will do alignment on index, you can convert one of the DataFrames to array:
因为操作会对索引进行对齐,所以您可以将 DataFrame 之一转换为数组:
prices[:-1].values / prices[1:] - 1
or
或者
prices[:-1] / prices[1:].values - 1
depends on what the index of the result you want.
取决于你想要的结果的索引。
or use shift()method:
或使用shift()方法:
prices.shift(1) / prices - 1
and:
和:
prices / prices.shift(1) - 1
回答by YaOzI
Why not use the very convenient pct_changemethod provided by pandasby default:
为什么不使用默认pct_change提供的非常方便的方法pandas:
import pandas as pd
prices = pandas.DataFrame([1035.23, 1032.47, 1011.78, 1010.59, 1016.03, 1007.95,
1022.75, 1021.52, 1026.11, 1027.04, 1030.58, 1030.42,
1036.24, 1015.00, 1015.20])
daily_return = prices.pct_change(1) # 1 for ONE DAY lookback
monthly_return = prices.pct_change(21) # 21 for ONE MONTH lookback
annual_return = prices.pct_change(252) # 252 for ONE YEAR lookback
Originalprices:
原文prices:
print(prices)
0
0 1035.23
1 1032.47
2 1011.78
3 1010.59
4 1016.03
5 1007.95
6 1022.75
7 1021.52
8 1026.11
9 1027.04
10 1030.58
11 1030.42
12 1036.24
13 1015.00
14 1015.20
Daily Returnas prices.pct_change(1):
每日回报为prices.pct_change(1):
print(prices.pct_change(1))
0
0 NaN
1 -0.002666
2 -0.020039
3 -0.001176
4 0.005383
5 -0.007953
6 0.014683
7 -0.001203
8 0.004493
9 0.000906
10 0.003447
11 -0.000155
12 0.005648
13 -0.020497
14 0.000197
回答by Yihua Zhou
Just a bit of complement to @YaOzl 's answer, and in case if someone would read this. If your return data is a panel spreadsheet with several stocks:
只是对@YaOzl 的回答的一点补充,以防万一有人会读到这个。如果您的回报数据是包含多只股票的面板电子表格:
>>> prices = pandas.DataFrame(
{"StkCode":["StockA","StockA","StockA","StockA","StockA","StockB","StockB","StockB","StockB","StockB","StockC","StockC","StockC","StockC","StockC",],
"Price":[1035.23, 1032.47, 1011.78, 1010.59, 1016.03, 1007.95, 1022.75, 1021.52, 1026.11, 1027.04, 1030.58, 1030.42, 1036.24, 1015.00, 1015.20]}
)
Which gives you:
这给了你:
Price StkCode
0 1035.23 StockA
1 1032.47 StockA
2 1011.78 StockA
3 1010.59 StockA
4 1016.03 StockA
5 1007.95 StockB
6 1022.75 StockB
7 1021.52 StockB
8 1026.11 StockB
9 1027.04 StockB
10 1030.58 StockC
11 1030.42 StockC
12 1036.24 StockC
13 1015.00 StockC
14 1015.20 StockC
Then you could simply jointly use .pct_change(k)with .groupby(StkCode). And it is thound times faster than using an iterator...(I tried on my dataset, successfully shrink the process time from 10 hrs to 20 seconds!!)
然后你可以简单地将.pct_change(k)与.groupby(StkCode) 一起使用。而且它比使用迭代器要快上千倍......(我在我的数据集上尝试过,成功地将处理时间从 10 小时缩短到 20 秒!!)
>>> prices["Return"] = prices.groupby("StkCode")["Price"].pct_change(1)
Gives you:
给你:
Price StkCode Return
0 1035.23 StockA NaN
1 1032.47 StockA -0.002666
2 1011.78 StockA -0.020039
3 1010.59 StockA -0.001176
4 1016.03 StockA 0.005383
5 1007.95 StockB NaN
6 1022.75 StockB 0.014683
7 1021.52 StockB -0.001203
8 1026.11 StockB 0.004493
9 1027.04 StockB 0.000906
10 1030.58 StockC NaN
11 1030.42 StockC -0.000155
12 1036.24 StockC 0.005648
13 1015.00 StockC -0.020497
14 1015.20 StockC 0.000197

