pandas Python 中的面板回归

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/36682343/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 01:03:47  来源:igfitidea点击:

a Panel regression in Python

pythonpandasstatsmodels

提问by jerreyz

I'm trying to run a panel regression on pandas Dataframes:

我正在尝试对 Pandas Dataframes 运行面板回归:

Currently I have two dataframes each containing 52 rows(dates)*99 columns(99stocks) :Markdown file with data representation

目前我有两个数据框,每个数据框包含 52 行(日期)* 99 列(99 个股票):带有数据表示的 Markdown 文件

When running:

运行时:

est=sm.OLS(Stockslist,averages).fit()
est.summary()

I get the ValueError: shapes (52,99) and (52,99) not aligned: 99 (dim 1) != 52 (dim 0)

我得到 ValueError: 形状 (52,99) 和 (52,99) 未对齐:99 (dim 1) != 52 (dim 0)

Can somebody point me out what I am doing wrong? The model is simply y(i,t)=x(i,t)+error term so no intercept. However I would like to add time effects in the future.

有人可以指出我做错了什么吗?该模型只是 y(i,t)=x(i,t)+误差项,因此没有截距。但是我想在未来添加时间效果。

Kind regards, Jeroen

亲切的问候,杰伦

回答by Stefan

Try the below - I've copied the stock data from the above link and added random data for the xcolumn. For a panel regression you need a 'MultiIndex' as mentioned in the comments.

尝试以下操作 - 我已经从上面的链接复制了股票数据并为该x列添加了随机数据。对于面板回归,您需要评论中提到的“MultiIndex”。

df = pd.DataFrame(df.set_index('dates').stack())
df.columns = ['y']
df['x'] = np.random.random(size=len(df.index))
df.info()

MultiIndex: 100 entries, (2015-04-03 00:00:00, AB INBEV) to (2015-05-01 00:00:00, ZC.PA)
Data columns (total 2 columns):
y    100 non-null float64
x    100 non-null float64
dtypes: float64(2)
memory usage: 2.3+ KB

regression = PanelOLS(y=df['y'], x=df[['x']])

regression

-------------------------Summary of Regression Analysis-------------------------

Formula: Y ~ <x> + <intercept>

Number of Observations:         100
Number of Degrees of Freedom:   2

R-squared:         0.0042
Adj R-squared:    -0.0060

Rmse:              0.2259

F-stat (1, 98):     0.4086, p-value:     0.5242

Degrees of Freedom: model 1, resid 98

-----------------------Summary of Estimated Coefficients------------------------
      Variable       Coef    Std Err     t-stat    p-value    CI 2.5%   CI 97.5%
--------------------------------------------------------------------------------
             x    -0.0507     0.0794      -0.64     0.5242    -0.2063     0.1048
     intercept     2.1952     0.0448      49.05     0.0000     2.1075     2.2829
---------------------------------End of Summary---------------------------------

回答by jerreyz

as you mentioned above I changed my code in the following way:

如上所述,我通过以下方式更改了代码:

  1. I transformed the stacks into two dataframes
  2. I concated them into a single multi index dataframe
  3. ran the regression and added time effects

    <class 'pandas.core.frame.DataFrame'>
    MultiIndex: 5096 entries, (2015-04-03 00:00:00, AB INBEV) to (25/03/16, ZC.PA)
    Data columns (total 2 columns):
    indvalues    5096 non-null float64
    avgvalues    5096 non-null float64
    dtypes: float64(2)
    memory usage: 119.4+ KB
    
    from pandas.stats.plm import PanelOLS
    regression=PanelOLS(y=df["indvalues"], x=df[["avgvalues"]], time_effects=True)
    
  1. 我将堆栈转换为两个数据帧
  2. 我将它们连接成一个单一的多索引数据帧
  3. 运行回归并添加时间效应

    <class 'pandas.core.frame.DataFrame'>
    MultiIndex: 5096 entries, (2015-04-03 00:00:00, AB INBEV) to (25/03/16, ZC.PA)
    Data columns (total 2 columns):
    indvalues    5096 non-null float64
    avgvalues    5096 non-null float64
    dtypes: float64(2)
    memory usage: 119.4+ KB
    
    from pandas.stats.plm import PanelOLS
    regression=PanelOLS(y=df["indvalues"], x=df[["avgvalues"]], time_effects=True)
    

the regression now works very nicely! Thank you Stefan Jansen

回归现在工作得很好!谢谢斯蒂芬·詹森