pandas 如何使用 Statsmodels.api 获取回归截距

Question

提问by Shank

I am trying calculate a regression output using python library but I am unabl;e to get the intercept value when I use the library:

我正在尝试使用 python 库计算回归输出，但我无法在使用库时获取截距值：

import statsmodels.api as sm

It prints all the regression analysis except the intercept.

它打印除截距之外的所有回归分析。

but when I use:

但是当我使用：

from pandas.stats.api import ols

My code for pandas:

我的Pandas代码：

Regression = ols(y= Sorted_Data3['net_realization_rate'],x = Sorted_Data3[['Cohort_2','Cohort_3']])
print Regression

I get the the intercept with a warning that this librabry will be deprecated in the future so I am trying to use Statsmodels.

我收到了一个警告，警告说这个库将来会被弃用，所以我正在尝试使用 Statsmodels。

the warning that I get while using pandas.stats.api:

我在使用 pandas.stats.api 时收到的警告：

Warning (from warnings module): File "C:\Python27\lib\idlelib\run.py", line 325 exec code in self.locals FutureWarning: The pandas.stats.ols module is deprecated and will be removed in a future version. We refer to external packages like statsmodels, see some examples here: http://statsmodels.sourceforge.net/stable/regression.html

警告（来自警告模块）：文件“C:\Python27\lib\idlelib\run.py”，self.locals 中的第 325 行 exec 代码 FutureWarning：不推荐使用 pandas.stats.ols 模块并将在未来版本中删除. 我们指的是 statsmodels 之类的外部包，请参见此处的一些示例：http://statsmodels.sourceforge.net/stable/regression.html

My code for Statsmodels:

我的 Statsmodels 代码：

import pandas as pd
import numpy as np
from pandas.stats.api import ols
import statsmodels.api as sm

Data1 = pd.read_csv('C:\Shank\Regression.csv')  #Importing CSV
print Data1

running some cleaning code

运行一些清理代码

sm_model = sm.OLS(Sorted_Data3['net_realization_rate'],Sorted_Data3[['Cohort_2','Cohort_3']])
results = sm_model.fit()
print '\n'
print results.summary()

I even tried statsmodels.formula.api: as:

我什至试过 statsmodels.formula.api: as:

sm_model = sm.OLS(formula ="net_realization_rate ~ Cohort_2 + Cohort_3", data = Sorted_Data3)
results = sm_model.fit()
print '\n'
print result.params
print '\n'
print results.summary()

but I get the error:

但我收到错误：

TypeError: init() takes at least 2 arguments (1 given)

TypeError: init() 需要至少 2 个参数（给定 1 个）

Final output: 1st is from pandas 2nd is from Stats.... I want the intercept vaule as the one from pandas from stats also:

最终输出：1st 来自 pandas 2nd 来自 Stats .... 我希望拦截 vaule 也是来自 stats 的 pandas：

Answer 1

回答by Kartik

So, statsmodelshas a add_constantmethod that you need to use to explicitly add intercept values. IMHO, this is better than the R alternative where the intercept is added by default.

因此，statsmodels有一个add_constant方法需要用来显式添加拦截值。恕我直言，这比默认添加拦截的 R 替代方案更好。

In your case, you need to do this:

在你的情况下，你需要这样做：

import statsmodels.api as sm
endog = Sorted_Data3['net_realization_rate']
exog = sm.add_constant(Sorted_Data3[['Cohort_2','Cohort_3']])

# Fit and summarize OLS model
mod = sm.OLS(endog, exog)
results = mod.fit()
print results.summary()

Note that you can add a constant before your array, or after it by passing True(default) or Falseto the prependkwag in sm.add_constant

请注意，您可以在数组之前或之后通过传递True（默认）或False向prependkwag添加一个常量sm.add_constant

Or, not recommended, but you can use Numpy to explicitly add a constant column like so:

或者，不推荐，但您可以使用 Numpy 显式添加一个常量列，如下所示：

exog = np.concatenate((np.repeat(1, len(Sorted_Data3))[:, None], 
                       Sorted_Data3[['Cohort_2','Cohort_3']].values),
                       axis = 1)

Answer 2

回答by Cody Mitchell

You can also do something like this:

你也可以做这样的事情：

df['intercept'] = 1

Here you are explicitly creating a column for the intercept.

在这里，您明确地为拦截创建了一个列。

Then you can just use the sm.OLS method like so:

然后你可以像这样使用 sm.OLS 方法：

lm = sm.OLS(df['y_column'], df[['intercept', 'x_column']])
results = lm.fit()
results.summary()

pandas 如何使用 Statsmodels.api 获取回归截距

提问by Shank

回答by Kartik

回答by Cody Mitchell

相关推荐

最近更新

标签

pandas 如何使用 Statsmodels.api 获取回归截距

提问by Shank

回答by Kartik

回答by Cody Mitchell

相关推荐

pandas 是否可以在python中获得每月的历史股价？

pandas 将列中的字符串转换为分类变量

如何删除 Pandas 中两个数据框中的公共行？

pandas 熊猫没有过滤条件

相关推荐

最近更新

标签