LogisticRegression:未知标签类型:在 python 中使用 sklearn 的“连续”
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/41925157/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
LogisticRegression: Unknown label type: 'continuous' using sklearn in python
提问by harrison4
I have the following code to test some of most popular ML algorithms of sklearn python library:
我有以下代码来测试 sklearn python 库的一些最流行的机器学习算法:
import numpy as np
from sklearn import metrics, svm
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.naive_bayes import GaussianNB
from sklearn.svm import SVC
trainingData = np.array([ [2.3, 4.3, 2.5], [1.3, 5.2, 5.2], [3.3, 2.9, 0.8], [3.1, 4.3, 4.0] ])
trainingScores = np.array( [3.4, 7.5, 4.5, 1.6] )
predictionData = np.array([ [2.5, 2.4, 2.7], [2.7, 3.2, 1.2] ])
clf = LinearRegression()
clf.fit(trainingData, trainingScores)
print("LinearRegression")
print(clf.predict(predictionData))
clf = svm.SVR()
clf.fit(trainingData, trainingScores)
print("SVR")
print(clf.predict(predictionData))
clf = LogisticRegression()
clf.fit(trainingData, trainingScores)
print("LogisticRegression")
print(clf.predict(predictionData))
clf = DecisionTreeClassifier()
clf.fit(trainingData, trainingScores)
print("DecisionTreeClassifier")
print(clf.predict(predictionData))
clf = KNeighborsClassifier()
clf.fit(trainingData, trainingScores)
print("KNeighborsClassifier")
print(clf.predict(predictionData))
clf = LinearDiscriminantAnalysis()
clf.fit(trainingData, trainingScores)
print("LinearDiscriminantAnalysis")
print(clf.predict(predictionData))
clf = GaussianNB()
clf.fit(trainingData, trainingScores)
print("GaussianNB")
print(clf.predict(predictionData))
clf = SVC()
clf.fit(trainingData, trainingScores)
print("SVC")
print(clf.predict(predictionData))
The first two works ok, but I got the following error in LogisticRegression
call:
前两个工作正常,但我在LogisticRegression
调用中遇到以下错误:
root@ubupc1:/home/ouhma# python stack.py
LinearRegression
[ 15.72023529 6.46666667]
SVR
[ 3.95570063 4.23426243]
Traceback (most recent call last):
File "stack.py", line 28, in <module>
clf.fit(trainingData, trainingScores)
File "/usr/local/lib/python2.7/dist-packages/sklearn/linear_model/logistic.py", line 1174, in fit
check_classification_targets(y)
File "/usr/local/lib/python2.7/dist-packages/sklearn/utils/multiclass.py", line 172, in check_classification_targets
raise ValueError("Unknown label type: %r" % y_type)
ValueError: Unknown label type: 'continuous'
The input data is the same as in the previous calls, so what is going on here?
输入数据与之前调用中的相同,那么这里发生了什么?
And by the way, why there is a huge diference in the first prediction of LinearRegression()
and SVR()
algorithms (15.72 vs 3.95)
?
顺便说一句,为什么在LinearRegression()
和SVR()
算法的第一次预测中存在巨大差异(15.72 vs 3.95)
?
回答by Maximilian Peters
You are passing floats to a classifier which expects categorical values as the target vector. If you convert it to int
it will be accepted as input (although it will be questionable if that's the right way to do it).
您将浮点数传递给分类器,该分类器将分类值作为目标向量。如果您将其转换为int
它将被接受为输入(尽管如果这是正确的方法,它会受到质疑)。
It would be better to convert your training scores by using scikit's labelEncoder
function.
最好使用 scikit 的labelEncoder
函数来转换你的训练分数。
The same is true for your DecisionTree and KNeighbors qualifier.
您的 DecisionTree 和 KNeighbors 限定符也是如此。
from sklearn import preprocessing
from sklearn import utils
lab_enc = preprocessing.LabelEncoder()
encoded = lab_enc.fit_transform(trainingScores)
>>> array([1, 3, 2, 0], dtype=int64)
print(utils.multiclass.type_of_target(trainingScores))
>>> continuous
print(utils.multiclass.type_of_target(trainingScores.astype('int')))
>>> multiclass
print(utils.multiclass.type_of_target(encoded))
>>> multiclass
回答by Sam Perry
I struggled with the same issue when trying to feed floats to the classifiers. I wanted to keep floats and not integers for accuracy. Try using regressor algorithms. For example:
在尝试将浮点数提供给分类器时,我遇到了同样的问题。为了准确起见,我想保留浮点数而不是整数。尝试使用回归算法。例如:
import numpy as np
from sklearn import linear_model
from sklearn import svm
classifiers = [
svm.SVR(),
linear_model.SGDRegressor(),
linear_model.BayesianRidge(),
linear_model.LassoLars(),
linear_model.ARDRegression(),
linear_model.PassiveAggressiveRegressor(),
linear_model.TheilSenRegressor(),
linear_model.LinearRegression()]
trainingData = np.array([ [2.3, 4.3, 2.5], [1.3, 5.2, 5.2], [3.3, 2.9, 0.8], [3.1, 4.3, 4.0] ])
trainingScores = np.array( [3.4, 7.5, 4.5, 1.6] )
predictionData = np.array([ [2.5, 2.4, 2.7], [2.7, 3.2, 1.2] ])
for item in classifiers:
print(item)
clf = item
clf.fit(trainingData, trainingScores)
print(clf.predict(predictionData),'\n')
回答by Thomas G.
LogisticRegression
is not for regressionbut classification!
LogisticRegression
不是回归而是分类!
The Y
variable must be the classification class,
该Y
变量必须是分类的类,
(for example 0
or 1
)
(例如0
或1
)
And not a continuous
variable,
而不是一个continuous
变量,
that would be a regressionproblem.
那将是一个回归问题。