Python Keras 如何处理多标签分类？

Question

提问by user798719

I am unsure how to interpret the default behavior of Keras in the following situation:

我不确定在以下情况下如何解释 Keras 的默认行为：

My Y (ground truth) was set up using scikit-learn's MultilabelBinarizer().

我的 Y（基本事实）是使用 scikit-learn 的MultilabelBinarizer() 设置的。

Therefore, to give a random example, one row of my ycolumn is one-hot encoded as such: [0,0,0,1,0,1,0,0,0,0,1].

因此，为了得到无规例如，我的一排y列是独热编码为这样的： [0,0,0,1,0,1,0,0,0,0,1]。

So I have 11 classes that could be predicted, and more than one can be true; hence the multilabel nature of the problem. There are three labels for this particular sample.

所以我有 11 个可以预测的类，并且不止一个类是真实的；因此问题的多标签性质。这个特定样本有三个标签。

I train the model as I would for a non multilabel problem (business as usual) and I get no errors.

我像处理非多标签问题一样训练模型（照常营业），并且没有出现任何错误。

from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation
from keras.optimizers import SGD

model = Sequential()
model.add(Dense(5000, activation='relu', input_dim=X_train.shape[1]))
model.add(Dropout(0.1))
model.add(Dense(600, activation='relu'))
model.add(Dropout(0.1))
model.add(Dense(y_train.shape[1], activation='softmax'))

sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy',
              optimizer=sgd,
              metrics=['accuracy',])

model.fit(X_train, y_train,epochs=5,batch_size=2000)

score = model.evaluate(X_test, y_test, batch_size=2000)
score

What does Keras do when it encounters my y_trainand sees that it is "multi" one-hot encoded, meaning there is more than one 'one' present in each row of y_train? Basically, does Keras automatically perform multilabel classification? Any differences in the interpretation of the scoring metrics?

当 Keras 遇到 myy_train并看到它是“多”单热编码时，它会做什么，这意味着每一行中存在多个“一个” y_train？基本上，Keras 会自动执行多标签分类吗？对评分指标的解释有什么不同吗？

Answer 1

回答by frankyjuang

In short

简而言之

Don't use softmax.

不要使用softmax.

Use sigmoidfor activation of your output layer.

使用sigmoid你的输出层的激活。

Use binary_crossentropyfor loss function.

使用binary_crossentropy的损失函数。

Use predictfor evaluation.

使用predict进行评估。

Why

为什么

In softmaxwhen increasing score for one label, all others are lowered (it's a probability distribution). You don't want that when you have multiple labels.

在softmax增加一个标签的分数时，所有其他标签都会降低（这是一种概率分布）。当您有多个标签时，您不希望这样。

Complete Code

完整代码

from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation
from keras.optimizers import SGD

model = Sequential()
model.add(Dense(5000, activation='relu', input_dim=X_train.shape[1]))
model.add(Dropout(0.1))
model.add(Dense(600, activation='relu'))
model.add(Dropout(0.1))
model.add(Dense(y_train.shape[1], activation='sigmoid'))

sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='binary_crossentropy',
              optimizer=sgd)

model.fit(X_train, y_train, epochs=5, batch_size=2000)

preds = model.predict(X_test)
preds[preds>=0.5] = 1
preds[preds<0.5] = 0
# score = compare preds and y_test

Python Keras 如何处理多标签分类？

提问by user798719

回答by frankyjuang

In short

简而言之

Why

为什么

Complete Code

完整代码

相关推荐

最近更新

标签

Python Keras 如何处理多标签分类？

提问by user798719

回答by frankyjuang

In short

简而言之

Why

为什么

Complete Code

完整代码

相关推荐

Python 使用 Anaconda 将 .py 转换为 .exe

Python 从列表中删除所有空元素

使用 Python 的随机森林特征重要性图表

Python Tensorflow Assign 需要两个张量的形状匹配。lhs 形状= [20] rhs 形状= [48]

相关推荐

最近更新

标签