pandas Scikit-learn - 多项逻辑回归的错误输入形状错误

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/34012100/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 00:17:52  来源:igfitidea点击:

Scikit-learn - Bad input shape error on multinomial logistic regression

pythonpandasmachine-learningscikit-learnlogistic-regression

提问by ExtremistEnigma

I'm implementing a multinomial logistic regression model in Python using Scikit-learn. Here's my code:

我正在使用 Scikit-learn 在 Python 中实现多项逻辑回归模型。这是我的代码:

X = pd.concat([each for each in feature_cols], axis=1)
y = train[["<5", "5-6", "6-7", "7-8", "8-9", "9-10"]]
lm = LogisticRegression(multi_class='multinomial', solver='lbfgs')
lm.fit(X, y)

However, I'm getting ValueError: bad input shape (50184, 6)when it tries to execute the last line of code.

但是,ValueError: bad input shape (50184, 6)当它尝试执行最后一行代码时,我得到了。

Xis a DataFramewith 50184 rows, 7 columns. yalso has 50184 rows, but 6 columns.

XDataFrame50184 行,7 列。y也有 50184 行,但有 6 列。

I ultimately want to predict in what bin (<5, 5-6, etc.) the outcome falls. All the independent and dependent variables used in this case are dummy columns which have a binary value of either 0 or 1. What am I missing?

我最终想预测结果落在哪个区间(<5、5-6 等)。在这种情况下使用的所有自变量和因变量都是虚拟列,它们的二进制值为 0 或 1。我错过了什么?

采纳答案by Stefan

The Logistic Regression 3-class Classifierexample illustrates how fitting LogisticRegressionuses a vector rather than a matrix input, in this case the targetvariable of the irisdataset, coded as values [0, 1, 2].

Logistic回归3级分类器实施例说明如何装配LogisticRegression使用的载体,而不是一个矩阵输入,在这种情况下,target所述的可变iris的数据集,编码为值[0, 1, 2]

To convert the dummy matrix to a series, you could multiply each column with a different integer, and then - assuming it's a pandas.DataFrame- just call .sum(axis=1)on the result. Something like:

要将虚拟矩阵转换为序列,您可以将每一列与不同的整数相乘,然后 - 假设它是一个pandas.DataFrame- 只需调用.sum(axis=1)结果。就像是:

for i, col in enumerate(y.columns.tolist(), 1):
    y.loc[:, col] *= i
y = y.sum(axis=1)