pandas Scikit-learn - 多项逻辑回归的错误输入形状错误
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/34012100/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Scikit-learn - Bad input shape error on multinomial logistic regression
提问by ExtremistEnigma
I'm implementing a multinomial logistic regression model in Python using Scikit-learn. Here's my code:
我正在使用 Scikit-learn 在 Python 中实现多项逻辑回归模型。这是我的代码:
X = pd.concat([each for each in feature_cols], axis=1)
y = train[["<5", "5-6", "6-7", "7-8", "8-9", "9-10"]]
lm = LogisticRegression(multi_class='multinomial', solver='lbfgs')
lm.fit(X, y)
However, I'm getting ValueError: bad input shape (50184, 6)
when it tries to execute the last line of code.
但是,ValueError: bad input shape (50184, 6)
当它尝试执行最后一行代码时,我得到了。
X
is a DataFrame
with 50184 rows, 7 columns. y
also has 50184 rows, but 6 columns.
X
是DataFrame
50184 行,7 列。y
也有 50184 行,但有 6 列。
I ultimately want to predict in what bin (<5, 5-6, etc.) the outcome falls. All the independent and dependent variables used in this case are dummy columns which have a binary value of either 0 or 1. What am I missing?
我最终想预测结果落在哪个区间(<5、5-6 等)。在这种情况下使用的所有自变量和因变量都是虚拟列,它们的二进制值为 0 或 1。我错过了什么?
采纳答案by Stefan
The Logistic Regression 3-class Classifierexample illustrates how fitting LogisticRegression
uses a vector rather than a matrix input, in this case the target
variable of the iris
dataset, coded as values [0, 1, 2]
.
的Logistic回归3级分类器实施例说明如何装配LogisticRegression
使用的载体,而不是一个矩阵输入,在这种情况下,target
所述的可变iris
的数据集,编码为值[0, 1, 2]
。
To convert the dummy matrix to a series, you could multiply each column with a different integer, and then - assuming it's a pandas.DataFrame
- just call .sum(axis=1)
on the result. Something like:
要将虚拟矩阵转换为序列,您可以将每一列与不同的整数相乘,然后 - 假设它是一个pandas.DataFrame
- 只需调用.sum(axis=1)
结果。就像是:
for i, col in enumerate(y.columns.tolist(), 1):
y.loc[:, col] *= i
y = y.sum(axis=1)