pandas 如何使用 Statsmodels.api 获取回归截距

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/38836465/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 01:45:33  来源:igfitidea点击:

How to get the regression intercept using Statsmodels.api

pythonpandasstatsmodels

提问by Shank

I am trying calculate a regression output using python library but I am unabl;e to get the intercept value when I use the library:

我正在尝试使用 python 库计算回归输出,但我无法在使用库时获取截距值:

import statsmodels.api as sm

It prints all the regression analysis except the intercept.

它打印除截距之外的所有回归分析。

but when I use:

但是当我使用:

from pandas.stats.api import ols

My code for pandas:

我的Pandas代码:

Regression = ols(y= Sorted_Data3['net_realization_rate'],x = Sorted_Data3[['Cohort_2','Cohort_3']])
print Regression  

I get the the intercept with a warning that this librabry will be deprecated in the future so I am trying to use Statsmodels.

我收到了一个警告,警告说这个库将来会被弃用,所以我正在尝试使用 Statsmodels。

the warning that I get while using pandas.stats.api:

我在使用 pandas.stats.api 时收到的警告:

Warning (from warnings module): File "C:\Python27\lib\idlelib\run.py", line 325 exec code in self.locals FutureWarning: The pandas.stats.ols module is deprecated and will be removed in a future version. We refer to external packages like statsmodels, see some examples here: http://statsmodels.sourceforge.net/stable/regression.html

警告(来自警告模块):文件“C:\Python27\lib\idlelib\run.py”,self.locals 中的第 325 行 exec 代码 FutureWarning:不推荐使用 pandas.stats.ols 模块并将在未来版本中删除. 我们指的是 statsmodels 之类的外部包,请参见此处的一些示例:http://statsmodels.sourceforge.net/stable/regression.html

My code for Statsmodels:

我的 Statsmodels 代码:

import pandas as pd
import numpy as np
from pandas.stats.api import ols
import statsmodels.api as sm

Data1 = pd.read_csv('C:\Shank\Regression.csv')  #Importing CSV
print Data1

running some cleaning code

运行一些清理代码

sm_model = sm.OLS(Sorted_Data3['net_realization_rate'],Sorted_Data3[['Cohort_2','Cohort_3']])
results = sm_model.fit()
print '\n'
print results.summary()

I even tried statsmodels.formula.api: as:

我什至试过 statsmodels.formula.api: as:

sm_model = sm.OLS(formula ="net_realization_rate ~ Cohort_2 + Cohort_3", data = Sorted_Data3)
results = sm_model.fit()
print '\n'
print result.params
print '\n'
print results.summary()

but I get the error:

但我收到错误:

TypeError: init() takes at least 2 arguments (1 given)

TypeError: init() 需要至少 2 个参数(给定 1 个)

Final output: 1st is from pandas 2nd is from Stats.... I want the intercept vaule as the one from pandas from stats also: enter image description here

最终输出:1st 来自 pandas 2nd 来自 Stats .... 我希望拦截 vaule 也是来自 stats 的 pandas: 在此处输入图片说明

回答by Kartik

So, statsmodelshas a add_constantmethod that you need to use to explicitly add intercept values. IMHO, this is better than the R alternative where the intercept is added by default.

因此,statsmodels有一个add_constant方法需要用来显式添加拦截值。恕我直言,这比默认添加拦截的 R 替代方案更好。

In your case, you need to do this:

在你的情况下,你需要这样做:

import statsmodels.api as sm
endog = Sorted_Data3['net_realization_rate']
exog = sm.add_constant(Sorted_Data3[['Cohort_2','Cohort_3']])

# Fit and summarize OLS model
mod = sm.OLS(endog, exog)
results = mod.fit()
print results.summary()

Note that you can add a constant before your array, or after it by passing True(default) or Falseto the prependkwag in sm.add_constant

请注意,您可以在数组之前或之后通过传递True(默认)或Falseprependkwag添加一个常量sm.add_constant



Or, not recommended, but you can use Numpy to explicitly add a constant column like so:

或者,不推荐,但您可以使用 Numpy 显式添加一个常量列,如下所示:

exog = np.concatenate((np.repeat(1, len(Sorted_Data3))[:, None], 
                       Sorted_Data3[['Cohort_2','Cohort_3']].values),
                       axis = 1)

回答by Cody Mitchell

You can also do something like this:

你也可以做这样的事情:

df['intercept'] = 1

Here you are explicitly creating a column for the intercept.

在这里,您明确地为拦截创建了一个列。

Then you can just use the sm.OLS method like so:

然后你可以像这样使用 sm.OLS 方法:

lm = sm.OLS(df['y_column'], df[['intercept', 'x_column']])
results = lm.fit()
results.summary()