pandas ValueError:使用 sklearn roc_auc_score 函数不支持多类多输出格式

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/50567008/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 05:37:03  来源:igfitidea点击:

ValueError: multiclass-multioutput format is not supported using sklearn roc_auc_score function

pythonpandasscikit-learnlogistic-regression

提问by stone rock

I am using logistic regressionfor prediction. My predictions are 0'sand 1's. After training my model on given data and also when training on important features i.e X_important_trainsee screenshot. I am getting score around 70% but when I use roc_auc_score(X,y)or roc_auc_score(X_important_train, y_train)I am getting value error: ValueError: multiclass-multioutput format is not supported

logistic regression用于预测。我的预测是0's1's。在给定数据上训练我的模型之后,以及在训练重要特征时,即X_important_train见截图。我得到大约 70% 的分数但是当我使用roc_auc_score(X,y)roc_auc_score(X_important_train, y_train)我得到值错误时: ValueError: multiclass-multioutput format is not supported

Code:

代码:

# Load libraries
from sklearn.linear_model import LogisticRegression
from sklearn import datasets
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import roc_auc_score

# Standarize features
scaler = StandardScaler()
X_std = scaler.fit_transform(X)

# Train the model using the training sets and check score
model.fit(X, y)
model.score(X, y)

model.fit(X_important_train, y_train)
model.score(X_important_train, y_train)

roc_auc_score(X_important_train, y_train)

Screenshot:

截屏:

enter image description here

在此处输入图片说明

采纳答案by seralouk

First of all, the roc_auc_scorefunction expects input arguments with the same shape.

首先,该roc_auc_score函数需要具有相同形状的输入参数。

sklearn.metrics.roc_auc_score(y_true, y_score, average='macro', sample_weight=None)

Note: this implementation is restricted to the binary classification task or multilabel classification task in label indicator format.

y_true : array, shape = [n_samples] or [n_samples, n_classes]
True binary labels in binary label indicators.

y_score : array, shape = [n_samples] or [n_samples, n_classes]
Target scores, can either be probability estimates of the positive class, confidence values, or non-thresholded measure of decisions (as returned by “decision_function” on some classifiers).

Now, the inputs are the true and predicted scores, NOT the training and label data as you are using in the example that you posted.In more detail,

现在,输入是真实分数和预测分数,而不是您在发布的示例中使用的训练和标签数据。更详细地说,

model.fit(X_important_train, y_train)
model.score(X_important_train, y_train)
# this is wrong here
roc_auc_score(X_important_train, y_train)

You should so something like:

你应该是这样的:

y_pred = model.predict(X_test_data)
roc_auc_score(y_true, y_pred)