LogisticRegression:未知标签类型:在 python 中使用 sklearn 的“连续”

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/41925157/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-20 01:53:48  来源:igfitidea点击:

LogisticRegression: Unknown label type: 'continuous' using sklearn in python

pythonnumpyscikit-learn

提问by harrison4

I have the following code to test some of most popular ML algorithms of sklearn python library:

我有以下代码来测试 sklearn python 库的一些最流行的机器学习算法:

import numpy as np
from sklearn                        import metrics, svm
from sklearn.linear_model           import LinearRegression
from sklearn.linear_model           import LogisticRegression
from sklearn.tree                   import DecisionTreeClassifier
from sklearn.neighbors              import KNeighborsClassifier
from sklearn.discriminant_analysis  import LinearDiscriminantAnalysis
from sklearn.naive_bayes            import GaussianNB
from sklearn.svm                    import SVC

trainingData    = np.array([ [2.3, 4.3, 2.5],  [1.3, 5.2, 5.2],  [3.3, 2.9, 0.8],  [3.1, 4.3, 4.0]  ])
trainingScores  = np.array( [3.4, 7.5, 4.5, 1.6] )
predictionData  = np.array([ [2.5, 2.4, 2.7],  [2.7, 3.2, 1.2] ])

clf = LinearRegression()
clf.fit(trainingData, trainingScores)
print("LinearRegression")
print(clf.predict(predictionData))

clf = svm.SVR()
clf.fit(trainingData, trainingScores)
print("SVR")
print(clf.predict(predictionData))

clf = LogisticRegression()
clf.fit(trainingData, trainingScores)
print("LogisticRegression")
print(clf.predict(predictionData))

clf = DecisionTreeClassifier()
clf.fit(trainingData, trainingScores)
print("DecisionTreeClassifier")
print(clf.predict(predictionData))

clf = KNeighborsClassifier()
clf.fit(trainingData, trainingScores)
print("KNeighborsClassifier")
print(clf.predict(predictionData))

clf = LinearDiscriminantAnalysis()
clf.fit(trainingData, trainingScores)
print("LinearDiscriminantAnalysis")
print(clf.predict(predictionData))

clf = GaussianNB()
clf.fit(trainingData, trainingScores)
print("GaussianNB")
print(clf.predict(predictionData))

clf = SVC()
clf.fit(trainingData, trainingScores)
print("SVC")
print(clf.predict(predictionData))

The first two works ok, but I got the following error in LogisticRegressioncall:

前两个工作正常,但我在LogisticRegression调用中遇到以下错误:

root@ubupc1:/home/ouhma# python stack.py 
LinearRegression
[ 15.72023529   6.46666667]
SVR
[ 3.95570063  4.23426243]
Traceback (most recent call last):
  File "stack.py", line 28, in <module>
    clf.fit(trainingData, trainingScores)
  File "/usr/local/lib/python2.7/dist-packages/sklearn/linear_model/logistic.py", line 1174, in fit
    check_classification_targets(y)
  File "/usr/local/lib/python2.7/dist-packages/sklearn/utils/multiclass.py", line 172, in check_classification_targets
    raise ValueError("Unknown label type: %r" % y_type)
ValueError: Unknown label type: 'continuous'

The input data is the same as in the previous calls, so what is going on here?

输入数据与之前调用中的相同,那么这里发生了什么?

And by the way, why there is a huge diference in the first prediction of LinearRegression()and SVR()algorithms (15.72 vs 3.95)?

顺便说一句,为什么在LinearRegression()SVR()算法的第一次预测中存在巨大差异(15.72 vs 3.95)

回答by Maximilian Peters

You are passing floats to a classifier which expects categorical values as the target vector. If you convert it to intit will be accepted as input (although it will be questionable if that's the right way to do it).

您将浮点数传递给分类器,该分类器将分类值作为目标向量。如果您将其转换为int它将被接受为输入(尽管如果这是正确的方法,它会受到质疑)。

It would be better to convert your training scores by using scikit's labelEncoderfunction.

最好使用 scikit 的labelEncoder函数来转换你的训练分数。

The same is true for your DecisionTree and KNeighbors qualifier.

您的 DecisionTree 和 KNeighbors 限定符也是如此。

from sklearn import preprocessing
from sklearn import utils

lab_enc = preprocessing.LabelEncoder()
encoded = lab_enc.fit_transform(trainingScores)
>>> array([1, 3, 2, 0], dtype=int64)

print(utils.multiclass.type_of_target(trainingScores))
>>> continuous

print(utils.multiclass.type_of_target(trainingScores.astype('int')))
>>> multiclass

print(utils.multiclass.type_of_target(encoded))
>>> multiclass

回答by Sam Perry

I struggled with the same issue when trying to feed floats to the classifiers. I wanted to keep floats and not integers for accuracy. Try using regressor algorithms. For example:

在尝试将浮点数提供给分类器时,我遇到了同样的问题。为了准确起见,我想保留浮点数而不是整数。尝试使用回归算法。例如:

import numpy as np
from sklearn import linear_model
from sklearn import svm

classifiers = [
    svm.SVR(),
    linear_model.SGDRegressor(),
    linear_model.BayesianRidge(),
    linear_model.LassoLars(),
    linear_model.ARDRegression(),
    linear_model.PassiveAggressiveRegressor(),
    linear_model.TheilSenRegressor(),
    linear_model.LinearRegression()]

trainingData    = np.array([ [2.3, 4.3, 2.5],  [1.3, 5.2, 5.2],  [3.3, 2.9, 0.8],  [3.1, 4.3, 4.0]  ])
trainingScores  = np.array( [3.4, 7.5, 4.5, 1.6] )
predictionData  = np.array([ [2.5, 2.4, 2.7],  [2.7, 3.2, 1.2] ])

for item in classifiers:
    print(item)
    clf = item
    clf.fit(trainingData, trainingScores)
    print(clf.predict(predictionData),'\n')

回答by Thomas G.

LogisticRegressionis not for regressionbut classification!

LogisticRegression不是回归而是分类

The Yvariable must be the classification class,

Y变量必须是分类的类,

(for example 0or 1)

(例如01

And not a continuousvariable,

而不是一个continuous变量,

that would be a regressionproblem.

那将是一个回归问题。