将 sklearn 函数应用于 Pandas 数据帧会给出 ValueError("Unknown label type: %r" % y)
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/34346140/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Applying sklearn function to pandas dataframe gives ValueError("Unknown label type: %r" % y)
提问by Alex
The following code gives an error message:
以下代码给出了错误消息:
>>> import pandas as pd
>>> from sklearn import preprocessing, svm
>>> df = pd.DataFrame({"a": [0,1,2], "b":[0,1,2], "c": [0,1,2]})
>>> clf = svm.SVC()
>>> df = df.apply(lambda x: preprocessing.scale(x))
>>> clf.fit(df[["a", "b"]], df["c"])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\Alexander\Anaconda\lib\site-packages\sklearn\svm\base.py", lin
151, in fit
y = self._validate_targets(y)
File "C:\Users\Alexander\Anaconda\lib\site-packages\sklearn\svm\base.py", lin
515, in _validate_targets
check_classification_targets(y)
File "C:\Users\Alexander\Anaconda\lib\site-packages\sklearn\utils\multiclass.
y", line 173, in check_classification_targets
raise ValueError("Unknown label type: %r" % y)
ValueError: Unknown label type: 0 -1.224745
1 0.000000
2 1.224745
Name: c, dtype: float64
The dtype of the pandas DataFrame is not an object, so applying the sklearn svm function should be fine, but for some reason it does not recognize the classification labels. What is causing this issue?
pandas DataFrame 的 dtype 不是对象,因此应用 sklearn svm 函数应该没问题,但由于某种原因它无法识别分类标签。是什么导致了这个问题?
回答by maxymoo
The issue is that after your scaling step, the labels are float-valued, which is not a valid label-type; if you convert to int
or str
it should work:
问题是在缩放步骤之后,标签是浮点值,这不是有效的标签类型;如果您转换为int
或str
它应该可以工作:
In [32]: clf.fit(df[["a", "b"]], df["c"].astype(int))
Out[32]:
SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
decision_function_shape=None, degree=3, gamma='auto', kernel='rbf',
max_iter=-1, probability=False, random_state=None, shrinking=True,
tol=0.001, verbose=False)