pandas Python 中的多元线性回归(PatsyError:模型缺少所需的结果变量)

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/38573354/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 01:40:08  来源:igfitidea点击:

Multiple Linear Regression in Python (PatsyError: model is missing required outcome variables)

pythonpandasregression

提问by SZA

I am running the following code for regression in Python and I get the error (PatsyError: model is missing required outcome variables). How do I fix it? Thanks

我在 Python 中运行以下用于回归的代码,但出现错误(PatsyError:模型缺少所需的结果变量)。我如何解决它?谢谢

Y = spikers['grade'] 
X = spikers[['num_pageview', 'num_video_play_resume', 'eng_proficiency', 'english']] 
model = smf.ols(Y,X).fit() 
model.summary()

回答by hassan naderi

You should use the following commands:

您应该使用以下命令:

df = pd.DataFrame({'x':X, 'y':Y})
model = smf.ols('y~x', data=df).fit()

In which dfis your DataFrame type data.

其中df是您的 DataFrame 类型数据。

回答by Debo

I had a very similar problem trying to run sm.logit on an outcome variable 'y' that is binary (0s or 1s): let all my data be in a pandas data frame called 'data:

我在尝试对二进制(0 或 1)的结果变量“y”运行 sm.logit 时遇到了一个非常相似的问题:让我的所有数据都位于名为“data”的 Pandas 数据框中:

import statsmodels.formula.api as sm

X = ['Age','Sex','x1','x2','x3','x4']
logit = sm.logit(data['y'],data[X])
result = logit.fit()
print result.summary()

Traceback (most recent call last):

  File "<ipython-input-XXX>", line 1, in <module>
    logit = sm.logit(data['y'],data[X])

  File "C:\...\statsmodels\base\model.py", line 147, in from_formula
    missing=missing)

  File "C:\...\statsmodels\formula\formulatools.py", line 68, in handle_formula_data
    NA_action=na_action)

  File "C:\...\patsy\highlevel.py", line 312, in dmatrices
    raise PatsyError("model is missing required outcome variables")

PatsyError: model is missing required outcome variables

I was getting this error message displayed above. I managed to fix that and pull out some sensible results by using this notation instead:

我收到上面显示的错误消息。我设法解决了这个问题,并通过使用这个表示法得出了一些合理的结果:

f1 = 'y ~ Age+Sex+x1+x2+x3+x4'
logit = sm.logit(formula = f1, data = data)
result = logit.fit()

This kind of notational use of the statsmodels.formula.api is usually preferred, as far as I can tell

据我所知,这种 statsmodels.formula.api 的符号用法通常是首选