pandas Python 中的面板回归
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/36682343/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
a Panel regression in Python
提问by jerreyz
I'm trying to run a panel regression on pandas Dataframes:
我正在尝试对 Pandas Dataframes 运行面板回归:
Currently I have two dataframes each containing 52 rows(dates)*99 columns(99stocks) :Markdown file with data representation
目前我有两个数据框,每个数据框包含 52 行(日期)* 99 列(99 个股票):带有数据表示的 Markdown 文件
When running:
运行时:
est=sm.OLS(Stockslist,averages).fit()
est.summary()
I get the ValueError: shapes (52,99) and (52,99) not aligned: 99 (dim 1) != 52 (dim 0)
我得到 ValueError: 形状 (52,99) 和 (52,99) 未对齐:99 (dim 1) != 52 (dim 0)
Can somebody point me out what I am doing wrong? The model is simply y(i,t)=x(i,t)+error term so no intercept. However I would like to add time effects in the future.
有人可以指出我做错了什么吗?该模型只是 y(i,t)=x(i,t)+误差项,因此没有截距。但是我想在未来添加时间效果。
Kind regards, Jeroen
亲切的问候,杰伦
回答by Stefan
Try the below - I've copied the stock data from the above link and added random data for the x
column. For a panel regression you need a 'MultiIndex' as mentioned in the comments.
尝试以下操作 - 我已经从上面的链接复制了股票数据并为该x
列添加了随机数据。对于面板回归,您需要评论中提到的“MultiIndex”。
df = pd.DataFrame(df.set_index('dates').stack())
df.columns = ['y']
df['x'] = np.random.random(size=len(df.index))
df.info()
MultiIndex: 100 entries, (2015-04-03 00:00:00, AB INBEV) to (2015-05-01 00:00:00, ZC.PA)
Data columns (total 2 columns):
y 100 non-null float64
x 100 non-null float64
dtypes: float64(2)
memory usage: 2.3+ KB
regression = PanelOLS(y=df['y'], x=df[['x']])
regression
-------------------------Summary of Regression Analysis-------------------------
Formula: Y ~ <x> + <intercept>
Number of Observations: 100
Number of Degrees of Freedom: 2
R-squared: 0.0042
Adj R-squared: -0.0060
Rmse: 0.2259
F-stat (1, 98): 0.4086, p-value: 0.5242
Degrees of Freedom: model 1, resid 98
-----------------------Summary of Estimated Coefficients------------------------
Variable Coef Std Err t-stat p-value CI 2.5% CI 97.5%
--------------------------------------------------------------------------------
x -0.0507 0.0794 -0.64 0.5242 -0.2063 0.1048
intercept 2.1952 0.0448 49.05 0.0000 2.1075 2.2829
---------------------------------End of Summary---------------------------------
回答by jerreyz
as you mentioned above I changed my code in the following way:
如上所述,我通过以下方式更改了代码:
- I transformed the stacks into two dataframes
- I concated them into a single multi index dataframe
ran the regression and added time effects
<class 'pandas.core.frame.DataFrame'> MultiIndex: 5096 entries, (2015-04-03 00:00:00, AB INBEV) to (25/03/16, ZC.PA) Data columns (total 2 columns): indvalues 5096 non-null float64 avgvalues 5096 non-null float64 dtypes: float64(2) memory usage: 119.4+ KB from pandas.stats.plm import PanelOLS regression=PanelOLS(y=df["indvalues"], x=df[["avgvalues"]], time_effects=True)
- 我将堆栈转换为两个数据帧
- 我将它们连接成一个单一的多索引数据帧
运行回归并添加时间效应
<class 'pandas.core.frame.DataFrame'> MultiIndex: 5096 entries, (2015-04-03 00:00:00, AB INBEV) to (25/03/16, ZC.PA) Data columns (total 2 columns): indvalues 5096 non-null float64 avgvalues 5096 non-null float64 dtypes: float64(2) memory usage: 119.4+ KB from pandas.stats.plm import PanelOLS regression=PanelOLS(y=df["indvalues"], x=df[["avgvalues"]], time_effects=True)
the regression now works very nicely! Thank you Stefan Jansen
回归现在工作得很好!谢谢斯蒂芬·詹森