如何在 Python 中绘制 ROC 曲线

Question

提问by user3847447

I am trying to plot a ROC curve to evaluate the accuracy of a prediction model I developed in Python using logistic regression packages. I have computed the true positive rate as well as the false positive rate; however, I am unable to figure out how to plot these correctly using matplotliband calculate the AUC value. How could I do that?

我正在尝试绘制 ROC 曲线以评估我使用逻辑回归包在 Python 中开发的预测模型的准确性。我已经计算了真阳性率和假阳性率；但是，我无法弄清楚如何使用matplotlib和计算 AUC 值正确绘制这些图。我怎么能那样做？

Answer 1

回答by ebarr

It is not at all clear what the problem is here, but if you have an array true_positive_rateand an array false_positive_rate, then plotting the ROC curve and getting the AUC is as simple as:

根本不清楚这里的问题是什么，但如果你有一个数组true_positive_rate和一个数组false_positive_rate，那么绘制 ROC 曲线并获得 AUC 就像这样简单：

import matplotlib.pyplot as plt
import numpy as np

x = # false_positive_rate
y = # true_positive_rate 

# This is the ROC curve
plt.plot(x,y)
plt.show() 

# This is the AUC
auc = np.trapz(y,x)

Answer 2

回答by Mona

Here is python code for computing the ROC curve (as a scatter plot):

这是用于计算 ROC 曲线的 Python 代码（作为散点图）：

import matplotlib.pyplot as plt
import numpy as np

score = np.array([0.9, 0.8, 0.7, 0.6, 0.55, 0.54, 0.53, 0.52, 0.51, 0.505, 0.4, 0.39, 0.38, 0.37, 0.36, 0.35, 0.34, 0.33, 0.30, 0.1])
y = np.array([1,1,0, 1, 1, 1, 0, 0, 1, 0, 1,0, 1, 0, 0, 0, 1 , 0, 1, 0])

# false positive rate
fpr = []
# true positive rate
tpr = []
# Iterate thresholds from 0.0, 0.01, ... 1.0
thresholds = np.arange(0.0, 1.01, .01)

# get number of positive and negative examples in the dataset
P = sum(y)
N = len(y) - P

# iterate through all thresholds and determine fraction of true positives
# and false positives found at this threshold
for thresh in thresholds:
    FP=0
    TP=0
    for i in range(len(score)):
        if (score[i] > thresh):
            if y[i] == 1:
                TP = TP + 1
            if y[i] == 0:
                FP = FP + 1
    fpr.append(FP/float(N))
    tpr.append(TP/float(P))

plt.scatter(fpr, tpr)
plt.show()

Answer 3

回答by Max

The previous answers assume that you indeed calculated TP/Sens yourself. It's a bad idea to do this manually, it's easy to make mistakes with the calculations, rather use a library function for all of this.

前面的答案假设您确实自己计算了 TP/Sens。手动执行此操作是一个坏主意，很容易在计算中出错，而应使用库函数来完成所有这些操作。

the plot_roc function in scikit_lean does exactly what you need: http://scikit-learn.org/stable/auto_examples/model_selection/plot_roc.html

scikit_lean 中的 plot_roc 函数正是您所需要的：http://scikit-learn.org/stable/auto_examples/model_selection/plot_roc.html

The essential part of the code is:

代码的基本部分是：

  for i in range(n_classes):
      fpr[i], tpr[i], _ = roc_curve(y_test[:, i], y_score[:, i])
      roc_auc[i] = auc(fpr[i], tpr[i])

Answer 4

回答by uniquegino

Here are two ways you may try, assuming your modelis an sklearn predictor:

假设您model是 sklearn 预测器，您可以尝试以下两种方法：

import sklearn.metrics as metrics
# calculate the fpr and tpr for all thresholds of the classification
probs = model.predict_proba(X_test)
preds = probs[:,1]
fpr, tpr, threshold = metrics.roc_curve(y_test, preds)
roc_auc = metrics.auc(fpr, tpr)

# method I: plt
import matplotlib.pyplot as plt
plt.title('Receiver Operating Characteristic')
plt.plot(fpr, tpr, 'b', label = 'AUC = %0.2f' % roc_auc)
plt.legend(loc = 'lower right')
plt.plot([0, 1], [0, 1],'r--')
plt.xlim([0, 1])
plt.ylim([0, 1])
plt.ylabel('True Positive Rate')
plt.xlabel('False Positive Rate')
plt.show()

# method II: ggplot
from ggplot import *
df = pd.DataFrame(dict(fpr = fpr, tpr = tpr))
ggplot(df, aes(x = 'fpr', y = 'tpr')) + geom_line() + geom_abline(linetype = 'dashed')

or try

或尝试

ggplot(df, aes(x = 'fpr', ymin = 0, ymax = 'tpr')) + geom_line(aes(y = 'tpr')) + geom_area(alpha = 0.2) + ggtitle("ROC Curve w/ AUC = %s" % str(roc_auc))

Answer 5

回答by Reii Nakano

This is the simplest way to plot an ROC curve, given a set of ground truth labels and predicted probabilities. Best part is, it plots the ROC curve for ALL classes, so you get multiple neat-looking curves as well

这是绘制 ROC 曲线的最简单方法，给定一组真实标签和预测概率。最好的部分是，它绘制了所有类别的 ROC 曲线，因此您也可以获得多个整洁的曲线

import scikitplot as skplt
import matplotlib.pyplot as plt

y_true = # ground truth labels
y_probas = # predicted probabilities generated by sklearn classifier
skplt.metrics.plot_roc_curve(y_true, y_probas)
plt.show()

Here's a sample curve generated by plot_roc_curve. I used the sample digits dataset from scikit-learn so there are 10 classes. Notice that one ROC curve is plotted for each class.

这是由 plot_roc_curve 生成的示例曲线。我使用了来自 scikit-learn 的示例数字数据集，所以有 10 个类。请注意，为每个类别绘制了一条 ROC 曲线。

Disclaimer: Note that this uses the scikit-plotlibrary, which I built.

免责声明：请注意，这使用了我构建的scikit-plot库。

Answer 6

回答by Brian Chan

I have made a simple function included in a package for the ROC curve. I just started practicing machine learning so please also let me know if this code has any problem!

我为 ROC 曲线制作了一个包含在包中的简单函数。我刚开始练习机器学习，所以如果这段代码有任何问题，也请告诉我！

Have a look at the github readme file for more details! :)

查看 github 自述文件了解更多详细信息！:)

https://github.com/bc123456/ROC

from sklearn.metrics import confusion_matrix, accuracy_score, roc_auc_score, roc_curve
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

def plot_ROC(y_train_true, y_train_prob, y_test_true, y_test_prob):
    '''
    a funciton to plot the ROC curve for train labels and test labels.
    Use the best threshold found in train set to classify items in test set.
    '''
    fpr_train, tpr_train, thresholds_train = roc_curve(y_train_true, y_train_prob, pos_label =True)
    sum_sensitivity_specificity_train = tpr_train + (1-fpr_train)
    best_threshold_id_train = np.argmax(sum_sensitivity_specificity_train)
    best_threshold = thresholds_train[best_threshold_id_train]
    best_fpr_train = fpr_train[best_threshold_id_train]
    best_tpr_train = tpr_train[best_threshold_id_train]
    y_train = y_train_prob > best_threshold

    cm_train = confusion_matrix(y_train_true, y_train)
    acc_train = accuracy_score(y_train_true, y_train)
    auc_train = roc_auc_score(y_train_true, y_train)

    print 'Train Accuracy: %s ' %acc_train
    print 'Train AUC: %s ' %auc_train
    print 'Train Confusion Matrix:'
    print cm_train

    fig = plt.figure(figsize=(10,5))
    ax = fig.add_subplot(121)
    curve1 = ax.plot(fpr_train, tpr_train)
    curve2 = ax.plot([0, 1], [0, 1], color='navy', linestyle='--')
    dot = ax.plot(best_fpr_train, best_tpr_train, marker='o', color='black')
    ax.text(best_fpr_train, best_tpr_train, s = '(%.3f,%.3f)' %(best_fpr_train, best_tpr_train))
    plt.xlim([0.0, 1.0])
    plt.ylim([0.0, 1.0])
    plt.xlabel('False Positive Rate')
    plt.ylabel('True Positive Rate')
    plt.title('ROC curve (Train), AUC = %.4f'%auc_train)

    fpr_test, tpr_test, thresholds_test = roc_curve(y_test_true, y_test_prob, pos_label =True)

    y_test = y_test_prob > best_threshold

    cm_test = confusion_matrix(y_test_true, y_test)
    acc_test = accuracy_score(y_test_true, y_test)
    auc_test = roc_auc_score(y_test_true, y_test)

    print 'Test Accuracy: %s ' %acc_test
    print 'Test AUC: %s ' %auc_test
    print 'Test Confusion Matrix:'
    print cm_test

    tpr_score = float(cm_test[1][1])/(cm_test[1][1] + cm_test[1][0])
    fpr_score = float(cm_test[0][1])/(cm_test[0][0]+ cm_test[0][1])

    ax2 = fig.add_subplot(122)
    curve1 = ax2.plot(fpr_test, tpr_test)
    curve2 = ax2.plot([0, 1], [0, 1], color='navy', linestyle='--')
    dot = ax2.plot(fpr_score, tpr_score, marker='o', color='black')
    ax2.text(fpr_score, tpr_score, s = '(%.3f,%.3f)' %(fpr_score, tpr_score))
    plt.xlim([0.0, 1.0])
    plt.ylim([0.0, 1.0])
    plt.xlabel('False Positive Rate')
    plt.ylabel('True Positive Rate')
    plt.title('ROC curve (Test), AUC = %.4f'%auc_test)
    plt.savefig('ROC', dpi = 500)
    plt.show()

    return best_threshold

A sample roc graph produced by this code

此代码生成的示例 roc 图

Answer 7

回答by Cherry Wu

from sklearn import metrics
import numpy as np
import matplotlib.pyplot as plt

y_true = # true labels
y_probas = # predicted results
fpr, tpr, thresholds = metrics.roc_curve(y_true, y_probas, pos_label=0)

# Print ROC curve
plt.plot(fpr,tpr)
plt.show() 

# Print AUC
auc = np.trapz(tpr,fpr)
print('AUC:', auc)

Answer 8

回答by ajayramesh

AUC curve For Binary Classification using matplotlib

使用 matplotlib 进行二元分类的 AUC 曲线

from sklearn import svm, datasets
from sklearn import metrics
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_breast_cancer
import matplotlib.pyplot as plt

Load Breast Cancer Dataset

加载乳腺癌数据集

breast_cancer = load_breast_cancer()

X = breast_cancer.data
y = breast_cancer.target

Split the Dataset

拆分数据集

X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.33, random_state=44)

Model

模型

clf = LogisticRegression(penalty='l2', C=0.1)
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)

Accuracy

准确性

print("Accuracy", metrics.accuracy_score(y_test, y_pred))

AUC Curve

AUC曲线

y_pred_proba = clf.predict_proba(X_test)[::,1]
fpr, tpr, _ = metrics.roc_curve(y_test,  y_pred_proba)
auc = metrics.roc_auc_score(y_test, y_pred_proba)
plt.plot(fpr,tpr,label="data 1, auc="+str(auc))
plt.legend(loc=4)
plt.show()

Answer 9

回答by Yohann L.

Based on multiple comments from stackoverflow, scikit-learn documentation and some other, I made a python package to plot ROC curve (and other metric) in a really simple way.

基于来自 stackoverflow、scikit-learn 文档和其他一些文档的多条评论，我制作了一个 python 包，以一种非常简单的方式绘制 ROC 曲线（和其他指标）。

To install package : pip install plot-metric(more info at the end of post)

安装包：（pip install plot-metric更多信息在帖子末尾）

To plot a ROC Curve (example come from the documentation) :

绘制 ROC 曲线（示例来自文档）：

Binary classification

二元分类

Let's load a simple dataset and make a train & test set :

让我们加载一个简单的数据集并制作一个训练和测试集：

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
X, y = make_classification(n_samples=1000, n_classes=2, weights=[1,1], random_state=1)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=2)

Train a classifier and predict test set :

训练分类器并预测测试集：

from sklearn.ensemble import RandomForestClassifier
clf = RandomForestClassifier(n_estimators=50, random_state=23)
model = clf.fit(X_train, y_train)

# Use predict_proba to predict probability of the class
y_pred = clf.predict_proba(X_test)[:,1]

You can now use plot_metric to plot ROC Curve :

您现在可以使用 plot_metric 来绘制 ROC 曲线：

from plot_metric.functions import BinaryClassification
# Visualisation with plot_metric
bc = BinaryClassification(y_test, y_pred, labels=["Class 1", "Class 2"])

# Figures
plt.figure(figsize=(5,5))
bc.plot_roc_curve()
plt.show()

Result :

结果：

You can find more example of on the github and documentation of the package:

您可以在 github 和软件包文档中找到更多示例：

Github : https://github.com/yohann84L/plot_metric
Documentation : https://plot-metric.readthedocs.io/en/latest/

Github：https: //github.com/yohann84L/plot_metric
文档：https: //plot-metric.readthedocs.io/en/latest/

Answer 10

回答by PV8

You can also follow the offical documentation form scikit:

您还可以按照 scikit 的官方文档格式进行操作：

https://scikit-learn.org/stable/auto_examples/model_selection/plot_roc.html#sphx-glr-auto-examples-model-selection-plot-roc-py

如何在 Python 中绘制 ROC 曲线

提问by user3847447

回答by ebarr

回答by Mona

回答by Max

回答by uniquegino

回答by Reii Nakano

回答by Brian Chan

回答by Cherry Wu

回答by ajayramesh

AUC curve For Binary Classification using matplotlib

使用 matplotlib 进行二元分类的 AUC 曲线

Load Breast Cancer Dataset

加载乳腺癌数据集

Split the Dataset

拆分数据集

Model

模型

Accuracy

准确性

AUC Curve

AUC曲线

回答by Yohann L.

Binary classification

二元分类

回答by PV8

相关推荐

最近更新

标签

如何在 Python 中绘制 ROC 曲线

提问by user3847447

回答by ebarr

回答by Mona

回答by Max

回答by uniquegino

回答by Reii Nakano

回答by Brian Chan

回答by Cherry Wu

回答by ajayramesh

AUC curve For Binary Classification using matplotlib

使用 matplotlib 进行二元分类的 AUC 曲线

Load Breast Cancer Dataset

加载乳腺癌数据集

Split the Dataset

拆分数据集

Model

模型

Accuracy

准确性

AUC Curve

AUC曲线

回答by Yohann L.

Binary classification

二元分类

回答by PV8

相关推荐

如何检查输入是否为 Python 中的数字？

Python Pyqt 如何获取小部件的尺寸

Python 将 Pandas 数据框中的列向上移一位？

Python 使用 .pth 文件

相关推荐

最近更新

标签