将 sklearn 函数应用于 Pandas 数据帧会给出 ValueError("Unknown label type: %r" % y)

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/34346140/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 00:23:51  来源:igfitidea点击:

Applying sklearn function to pandas dataframe gives ValueError("Unknown label type: %r" % y)

pythonpandasscikit-learn

提问by Alex

The following code gives an error message:

以下代码给出了错误消息:

    >>> import pandas as pd
    >>> from sklearn import preprocessing, svm
    >>> df = pd.DataFrame({"a": [0,1,2], "b":[0,1,2], "c": [0,1,2]})
    >>> clf = svm.SVC()
    >>> df = df.apply(lambda x: preprocessing.scale(x))
    >>> clf.fit(df[["a", "b"]], df["c"])
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "C:\Users\Alexander\Anaconda\lib\site-packages\sklearn\svm\base.py", lin
     151, in fit
        y = self._validate_targets(y)
      File "C:\Users\Alexander\Anaconda\lib\site-packages\sklearn\svm\base.py", lin
     515, in _validate_targets
        check_classification_targets(y)
      File "C:\Users\Alexander\Anaconda\lib\site-packages\sklearn\utils\multiclass.
    y", line 173, in check_classification_targets
        raise ValueError("Unknown label type: %r" % y)
    ValueError: Unknown label type: 0   -1.224745
    1    0.000000
    2    1.224745
    Name: c, dtype: float64

The dtype of the pandas DataFrame is not an object, so applying the sklearn svm function should be fine, but for some reason it does not recognize the classification labels. What is causing this issue?

pandas DataFrame 的 dtype 不是对象,因此应用 sklearn svm 函数应该没问题,但由于某种原因它无法识别分类标签。是什么导致了这个问题?

回答by maxymoo

The issue is that after your scaling step, the labels are float-valued, which is not a valid label-type; if you convert to intor strit should work:

问题是在缩放步骤之后,标签是浮点值,这不是有效的标签类型;如果您转换为intstr它应该可以工作:

In [32]: clf.fit(df[["a", "b"]], df["c"].astype(int))
Out[32]: 
SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape=None, degree=3, gamma='auto', kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False)