Python Sklearn StratifiedKFold: ValueError: 支持的目标类型是: ('binary', 'multiclass')。得到了“多标签指标”

Question

提问by jKraut

Working with Sklearn stratified kfold split, and when I attempt to split using multi-class, I received on error (see below). When I tried and split using binary, it works no problem.

使用 Sklearn 分层 kfold 拆分，当我尝试使用多类拆分时，我收到错误消息（见下文）。当我尝试使用二进制进行拆分时，它没有问题。

num_classes = len(np.unique(y_train))
y_train_categorical = keras.utils.to_categorical(y_train, num_classes)
kf=StratifiedKFold(n_splits=5, shuffle=True, random_state=999)

# splitting data into different folds
for i, (train_index, val_index) in enumerate(kf.split(x_train, y_train_categorical)):
    x_train_kf, x_val_kf = x_train[train_index], x_train[val_index]
    y_train_kf, y_val_kf = y_train[train_index], y_train[val_index]

ValueError: Supported target types are: ('binary', 'multiclass'). Got 'multilabel-indicator' instead.

Answer 1

采纳答案by desertnaut

keras.utils.to_categoricalproduces a one-hot encoded class vector, i.e. the multilabel-indicatormentioned in the error message. StratifiedKFoldis not designed to work with such input; from the splitmethod docs:

keras.utils.to_categorical产生一个单热编码的类向量，即multilabel-indicator错误消息中提到的。StratifiedKFold并非设计用于处理此类输入；从split方法文档：

split(X, y, groups=None)
[...]
y: array-like, shape (n_samples,)
The target variable for supervised learning problems. Stratification is done based on the y labels.

split(X, y, 组=无)
[...]
y: 类似数组，形状 (n_samples,)
监督学习问题的目标变量。分层是基于 y 标签完成的。

i.e. your ymust be a 1-D array of your class labels.

即你y必须是你的类标签的一维数组。

Essentially, what you have to do is simply to invert the order of the operations: split first (using your intial y_train), and convert to_categoricalafterwards.

本质上，您要做的只是反转操作的顺序：首先拆分（使用您的 intial y_train），然后转换to_categorical。

Answer 2

回答by wilmeragsgh

Call to split()like this:

打电话给split()这样的：

for i, (train_index, val_index) in enumerate(kf.split(x_train, y_train_categorical.argmax(1))):
    x_train_kf, x_val_kf = x_train[train_index], x_train[val_index]
    y_train_kf, y_val_kf = y_train[train_index], y_train[val_index]

Answer 3

回答by nocibambi

I bumped into the same problem and found out that you can check the type of the target with this utilfunction:

我遇到了同样的问题，发现你可以用这个util函数检查目标的类型：

from sklearn.utils.multiclass import type_of_target
type_of_target(y)

'multilabel-indicator'

From its docstring:

从它的文档字符串：

'binary': ycontains <= 2 discrete values and is 1d or a column vector.
'multiclass': ycontains more than two discrete values, is not a sequence of sequences, and is 1d or a column vector.
'multiclass-multioutput': yis a 2d array that contains more than two discrete values, is not a sequence of sequences, and both dimensions are of size > 1.
'multilabel-indicator': yis a label indicator matrix, an array of two dimensions with at least two columns, and at most 2 unique values.

'binary'：y包含 <= 2 个离散值并且是 1d 或列向量。
'multiclass'：y包含两个以上的离散值，不是序列序列，是1d或列向量。
'multiclass-multioutput'：y是一个包含两个以上离散值的二维数组，不是序列序列，并且两个维度的大小都大于 1。
'multilabel-indicator'：y是一个标签指示矩阵，一个二维数组，至少有两列，最多有 2 个唯一值。

With LabelEncoderyou can transform your classes into an 1d array of numbers (given your target labels are in an 1d array of categoricals/object):

随着LabelEncoder您可以将您的类成数字的一维数组（给你的目标标签是在categoricals的一维数组/对象）：

from sklearn.preprocessing import LabelEncoder

label_encoder = LabelEncoder()
y = label_encoder.fit_transform(target_labels)

Answer 4

回答by shadi

In my case, xwas a 2D matrix, and ywas also a 2d matrix, i.e. indeed a multi-class multi-output case. I just passed a dummy np.zeros(shape=(n,1))for the yand the xas usual. Full code example:

在我的例子中，x是一个二维矩阵，y也是一个二维矩阵，即确实是一个多类多输出案例。我只是通过一个虚拟np.zeros(shape=(n,1))的y和x往常一样。完整代码示例：

import numpy as np
from sklearn.model_selection import RepeatedStratifiedKFold
X = np.array([[1, 2], [3, 4], [1, 2], [3, 4], [3, 7], [9, 4]])
# y = np.array([0, 0, 1, 1, 0, 1]) # <<< works
y = X # does not work if passed into `.split`
rskf = RepeatedStratifiedKFold(n_splits=3, n_repeats=3, random_state=36851234)
for train_index, test_index in rskf.split(X, np.zeros(shape=(X.shape[0], 1))):
    print("TRAIN:", train_index, "TEST:", test_index)
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]

Python Sklearn StratifiedKFold: ValueError: 支持的目标类型是: ('binary', 'multiclass')。得到了“多标签指标”

提问by jKraut

采纳答案by desertnaut

回答by wilmeragsgh

回答by nocibambi

回答by shadi

相关推荐

最近更新

标签

Python Sklearn StratifiedKFold: ValueError: 支持的目标类型是: ('binary', 'multiclass')。得到了“多标签指标”

提问by jKraut

采纳答案by desertnaut

回答by wilmeragsgh

回答by nocibambi

回答by shadi

相关推荐

Anaconda：永久包含外部包（如在 PYTHONPATH 中）

计算python字典中某个值出现的次数？

Python Caffe编译时没有看到hdf5.h

Python 导入错误：无法导入名称“ensure_dir_exists”

相关推荐

最近更新

标签