Python Scikit-learn GridSearch 给出“ValueError: multiclass format is not supported”错误

Question

提问by theharshest

I'm trying to use GridSearch for parameter estimation of LinearSVC() as follows -

我正在尝试使用 GridSearch 进行 LinearSVC() 的参数估计，如下所示 -

clf_SVM = LinearSVC()
params = {
          'C': [0.5, 1.0, 1.5],
          'tol': [1e-3, 1e-4, 1e-5],
          'multi_class': ['ovr', 'crammer_singer'],
          }
gs = GridSearchCV(clf_SVM, params, cv=5, scoring='roc_auc')
gs.fit(corpus1, y)

corpus1 has shape (1726, 7001) and y has shape (1726,)

corpus1 具有形状 (1726, 7001) 并且 y 具有形状 (1726,)

This is a multiclass classification, and y has values from 0 to 3, both inclusive, i.e. there are four classes.

这是一个多类分类，y 的值从 0 到 3，包括 0 到 3，即有四个类。

But this is giving me the following error -

但这给了我以下错误-

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-220-0c627bda0543> in <module>()
      5           }
      6 gs = GridSearchCV(clf_SVM, params, cv=5, scoring='roc_auc')
----> 7 gs.fit(corpus1, y)

/usr/local/lib/python2.7/dist-packages/sklearn/grid_search.pyc in fit(self, X, y)
    594 
    595         """
--> 596         return self._fit(X, y, ParameterGrid(self.param_grid))
    597 
    598 

/usr/local/lib/python2.7/dist-packages/sklearn/grid_search.pyc in _fit(self, X, y, parameter_iterable)
    376                                     train, test, self.verbose, parameters,
    377                                     self.fit_params, return_parameters=True)
--> 378             for parameters in parameter_iterable
    379             for train, test in cv)
    380 

/usr/local/lib/python2.7/dist-packages/sklearn/externals/joblib/parallel.pyc in __call__(self, iterable)
    651             self._iterating = True
    652             for function, args, kwargs in iterable:
--> 653                 self.dispatch(function, args, kwargs)
    654 
    655             if pre_dispatch == "all" or n_jobs == 1:

/usr/local/lib/python2.7/dist-packages/sklearn/externals/joblib/parallel.pyc in dispatch(self, func, args, kwargs)
    398         """
    399         if self._pool is None:
--> 400             job = ImmediateApply(func, args, kwargs)
    401             index = len(self._jobs)
    402             if not _verbosity_filter(index, self.verbose):

/usr/local/lib/python2.7/dist-packages/sklearn/externals/joblib/parallel.pyc in __init__(self, func, args, kwargs)
    136         # Don't delay the application, to avoid keeping the input
    137         # arguments in memory
--> 138         self.results = func(*args, **kwargs)
    139 
    140     def get(self):

/usr/local/lib/python2.7/dist-packages/sklearn/cross_validation.pyc in _fit_and_score(estimator, X, y, scorer, train, test, verbose, parameters, fit_params, return_train_score, return_parameters)
   1238     else:
   1239         estimator.fit(X_train, y_train, **fit_params)
-> 1240     test_score = _score(estimator, X_test, y_test, scorer)
   1241     if return_train_score:
   1242         train_score = _score(estimator, X_train, y_train, scorer)

/usr/local/lib/python2.7/dist-packages/sklearn/cross_validation.pyc in _score(estimator, X_test, y_test, scorer)
   1294         score = scorer(estimator, X_test)
   1295     else:
-> 1296         score = scorer(estimator, X_test, y_test)
   1297     if not isinstance(score, numbers.Number):
   1298         raise ValueError("scoring must return a number, got %s (%s) instead."

/usr/local/lib/python2.7/dist-packages/sklearn/metrics/scorer.pyc in __call__(self, clf, X, y)
    136         y_type = type_of_target(y)
    137         if y_type not in ("binary", "multilabel-indicator"):
--> 138             raise ValueError("{0} format is not supported".format(y_type))
    139 
    140         try:

ValueError: multiclass format is not supported

Answer 1

回答by user1269942

from:

从：

http://scikit-learn.org/stable/modules/generated/sklearn.metrics.roc_auc_score.html#sklearn.metrics.roc_auc_score

"Note: this implementation is restricted to the binary classification task or multilabel classification task in label indicator format."

“注意：此实现仅限于标签指示符格式的二元分类任务或多标签分类任务。”

try:

尝试：

from sklearn import preprocessing
y = preprocessing.label_binarize(y, classes=[0, 1, 2, 3])

before you train. this will perform a "one-hot" encoding of your y.

在你训练之前。这将对您的 y 执行“one-hot”编码。

Answer 2

回答by Jordi Colomer

As it has been pointed out, you must first binarize y

正如已经指出的那样，您必须首先进行二值化 y

y = label_binarize(y, classes=[0, 1, 2, 3])

and then use a multiclass learning algorithm like OneVsRestClassifieror OneVsOneClassifier. For example:

然后使用多类学习算法，如OneVsRestClassifier或OneVsOneClassifier。例如：

clf_SVM = OneVsRestClassifier(LinearSVC())
params = {
      'estimator__C': [0.5, 1.0, 1.5],
      'estimator__tol': [1e-3, 1e-4, 1e-5],
      }
gs = GridSearchCV(clf_SVM, params, cv=5, scoring='roc_auc')
gs.fit(corpus1, y)

Answer 3

回答by Nipun kumar goel

Remove scoring='roc_auc'and it will work as roc_auccurve does not support categorical data.

删除scoring='roc_auc'它会起作用，因为roc_auc曲线不支持分类数据。

Python Scikit-learn GridSearch 给出“ValueError: multiclass format is not supported”错误

提问by theharshest

回答by user1269942

回答by Jordi Colomer

回答by Nipun kumar goel

相关推荐

最近更新

标签

Python Scikit-learn GridSearch 给出“ValueError: multiclass format is not supported”错误

提问by theharshest

回答by user1269942

回答by Jordi Colomer

回答by Nipun kumar goel

相关推荐

Python 素数分解 - 列表

Python 类型错误：% 不支持的操作数类型：'NoneType' 和 'int'

Python 制作快速端口扫描器

Python 如何将回车键绑定到 tkinter 中的函数？

相关推荐

最近更新

标签