Python Scikit-learn：如何获得真阳性、真阴性、假阳性和假阴性

Question

提问by Euskalduna

My problem:

我的问题：

I have a dataset which is a large JSON file. I read it and store it in the trainListvariable.

我有一个数据集，它是一个大型 JSON 文件。我读取它并将其存储在trainList变量中。

Next, I pre-process it - in order to be able to work with it.

接下来，我对其进行预处理 - 为了能够使用它。

Once I have done that I start the classification:

完成后，我开始分类：

I use the kfoldcross validation method in order to obtain the mean accuracy and train a classifier.
I make the predictions and obtain the accuracy & confusion matrix of that fold.
After this, I would like to obtain the True Positive(TP), True Negative(TN), False Positive(FP)and False Negative(FN)values. I'll use these parameters to obtain the Sensitivityand Specificity.

我使用kfold交叉验证方法来获得平均准确度并训练分类器。
我做出预测并获得该折叠的准确性和混淆矩阵。
在此之后，我想获得的True Positive(TP)，True Negative(TN)，False Positive(FP)和False Negative(FN)值。我将使用这些参数来获得Sensitivity和Specificity。

Finally, I would use this to put in HTML in order to show a chart with the TPs of each label.

最后，我将使用它来放入 HTML 以显示带有每个标签的 TP 的图表。

Code:

代码：

The variables I have for the moment:

我目前拥有的变量：

trainList #It is a list with all the data of my dataset in JSON form
labelList #It is a list with all the labels of my data

Most part of the method:

大部分方法：

#I transform the data from JSON form to a numerical one
X=vec.fit_transform(trainList)

#I scale the matrix (don't know why but without it, it makes an error)
X=preprocessing.scale(X.toarray())

#I generate a KFold in order to make cross validation
kf = KFold(len(X), n_folds=10, indices=True, shuffle=True, random_state=1)

#I start the cross validation
for train_indices, test_indices in kf:
    X_train=[X[ii] for ii in train_indices]
    X_test=[X[ii] for ii in test_indices]
    y_train=[listaLabels[ii] for ii in train_indices]
    y_test=[listaLabels[ii] for ii in test_indices]

    #I train the classifier
    trained=qda.fit(X_train,y_train)

    #I make the predictions
    predicted=qda.predict(X_test)

    #I obtain the accuracy of this fold
    ac=accuracy_score(predicted,y_test)

    #I obtain the confusion matrix
    cm=confusion_matrix(y_test, predicted)

    #I should calculate the TP,TN, FP and FN 
    #I don't know how to continue

Answer 1

采纳答案by invoketheshell

If you have two lists that have the predicted and actual values; as it appears you do, you can pass them to a function that will calculate TP, FP, TN, FN with something like this:

如果您有两个具有预测值和实际值的列表；正如您所做的那样，您可以将它们传递给一个函数，该函数将使用以下内容计算 TP、FP、TN、FN：

def perf_measure(y_actual, y_hat):
    TP = 0
    FP = 0
    TN = 0
    FN = 0

    for i in range(len(y_hat)): 
        if y_actual[i]==y_hat[i]==1:
           TP += 1
        if y_hat[i]==1 and y_actual[i]!=y_hat[i]:
           FP += 1
        if y_actual[i]==y_hat[i]==0:
           TN += 1
        if y_hat[i]==0 and y_actual[i]!=y_hat[i]:
           FN += 1

    return(TP, FP, TN, FN)

From here I think you will be able to calculate rates of interest to you, and other performance measure like specificity and sensitivity.

从这里我认为您将能够计算出您的利率，以及其他性能指标，如特异性和敏感性。

Answer 2

回答by Akshat Harit

You can obtain all of the parameters from the confusion matrix. The structure of the confusion matrix(which is 2X2 matrix) is as follows (assuming the first index is related to the positive label, and the rows are related to the true labels):

您可以从混淆矩阵中获取所有参数。混淆矩阵（2X2矩阵）的结构如下（假设第一个索引与正标签相关，行与真实标签相关）：

TP|FN
FP|TN

So

所以

TP = cm[0][0]
FN = cm[0][1]
FP = cm[1][0]
TN = cm[1][1]

More details at https://en.wikipedia.org/wiki/Confusion_matrix

更多详情请访问https://en.wikipedia.org/wiki/Confusion_matrix

Answer 3

回答by ykorkmaz

I think both of the answers are not fully correct. For example, suppose that we have the following arrays;
y_actual = [1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0]

我认为这两个答案都不完全正确。例如，假设我们有以下数组；
y_actual = [1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0]

y_predic = [1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0]

If we compute the FP, FN, TP and TN values manually, they should be as follows:

如果我们手动计算 FP、FN、TP 和 TN 值，它们应该如下：

FP: 3 FN: 1 TP: 3 TN: 4

FP：3 FN：1 TP：3 TN：4

However, if we use the first answer, results are given as follows:

但是，如果我们使用第一个答案，结果如下：

FP: 1 FN: 3 TP: 3 TN: 4

FP：1 FN：3 TP：3 TN：4

They are not correct, because in the first answer, False Positive should be where actual is 0, but the predicted is 1, not the opposite. It is also same for False Negative.

它们不正确，因为在第一个答案中，False Positive 应该是实际为 0，但预测为 1，而不是相反。对于假阴性也是一样的。

And, if we use the second answer, the results are computed as follows:

而且，如果我们使用第二个答案，结果计算如下：

FP: 3 FN: 1 TP: 4 TN: 3

FP：3 FN：1 TP：4 TN：3

True Positive and True Negative numbers are not correct, they should be opposite.

真正数和真负数是不正确的，它们应该是相反的。

Am I correct with my computations? Please let me know if I am missing something.

我的计算正确吗？如果我遗漏了什么，请告诉我。

Answer 4

回答by gruangly

According to scikit-learn documentation,

根据 scikit-learn 文档，

http://scikit-learn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html#sklearn.metrics.confusion_matrix

By definition a confusion matrix C is such that C[i, j]is equal to the number of observations known to be in group ibut predicted to be in group j.

根据定义，混淆矩阵 CC[i, j]等于已知在 group 中i但预测在 group 中的观察数j。

Thus in binary classification, the count of true negatives is C[0,0], false negatives is C[1,0], true positives is C[1,1]and false positives is C[0,1].

因此，在二元分类中，真负数为C[0,0]，假负数为C[1,0]，真正数为C[1,1]，假正数为C[0,1]。

CM = confusion_matrix(y_true, y_pred)

TN = CM[0][0]
FN = CM[1][0]
TP = CM[1][1]
FP = CM[0][1]

Answer 5

回答by enterbutton

if you have more than one classes in your classifier, you might want to use pandas-ml at that part. Confusion Matrix of pandas-ml give more detailed information. check that

如果您的分类器中有多个类，您可能希望在该部分使用 pandas-ml。pandas-ml 的混淆矩阵提供了更详细的信息。检查一下

Answer 6

回答by lucidv01d

For the multi-class case, everything you need can be found from the confusion matrix. For example, if your confusion matrix looks like this:

对于多类情况，您需要的一切都可以从混淆矩阵中找到。例如，如果您的混淆矩阵如下所示：

Then what you're looking for, per class, can be found like this:

然后，您可以在每个班级中找到您要查找的内容，如下所示：

Using pandas/numpy, you can do this for all classes at once like so:

使用 pandas/numpy，您可以一次为所有类执行此操作，如下所示：

FP = confusion_matrix.sum(axis=0) - np.diag(confusion_matrix)  
FN = confusion_matrix.sum(axis=1) - np.diag(confusion_matrix)
TP = np.diag(confusion_matrix)
TN = confusion_matrix.values.sum() - (FP + FN + TP)

# Sensitivity, hit rate, recall, or true positive rate
TPR = TP/(TP+FN)
# Specificity or true negative rate
TNR = TN/(TN+FP) 
# Precision or positive predictive value
PPV = TP/(TP+FP)
# Negative predictive value
NPV = TN/(TN+FN)
# Fall out or false positive rate
FPR = FP/(FP+TN)
# False negative rate
FNR = FN/(TP+FN)
# False discovery rate
FDR = FP/(TP+FP)

# Overall accuracy
ACC = (TP+TN)/(TP+FP+FN+TN)

Answer 7

回答by Joseloman

In the scikit-learn 'metrics' library there is a confusion_matrix method which gives you the desired output.

在 scikit-learn 'metrics' 库中，有一个 Confusion_matrix 方法可以为您提供所需的输出。

You can use any classifier that you want. Here I used the KNeighbors as example.

您可以使用任何您想要的分类器。这里我以 KNeighbors 为例。

from sklearn import metrics, neighbors

clf = neighbors.KNeighborsClassifier()

X_test = ...
y_test = ...

expected = y_test
predicted = clf.predict(X_test)

conf_matrix = metrics.confusion_matrix(expected, predicted)

>>> print conf_matrix
>>>  [[1403   87]
     [  56 3159]]

The docs: http://scikit-learn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html#sklearn.metrics.confusion_matrix

文档：http: //scikit-learn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html#sklearn.metrics.confusion_matrix

Answer 8

回答by andandandand

Here's a fix to invoketheshell's buggy code (which currently appears as the accepted answer):

这是调用theshell错误代码的修复程序（当前显示为已接受的答案）：

def performance_measure(y_actual, y_hat):
    TP = 0
    FP = 0
    TN = 0
    FN = 0

    for i in range(len(y_hat)): 
        if y_actual[i] == y_hat[i]==1:
            TP += 1
        if y_hat[i] == 1 and y_actual[i] == 0:
            FP += 1
        if y_hat[i] == y_actual[i] == 0:
            TN +=1
        if y_hat[i] == 0 and y_actual[i] == 1:
            FN +=1

    return(TP, FP, TN, FN)

Answer 9

回答by Julio Cárdenas-Rodríguez

I wrote a version that works using only numpy. I hope it helps you.

我写了一个只使用 numpy 的版本。我希望它能帮助你。

import numpy as np

def perf_metrics_2X2(yobs, yhat):
    """
    Returns the specificity, sensitivity, positive predictive value, and 
    negative predictive value 
    of a 2X2 table.

    where:
    0 = negative case
    1 = positive case

    Parameters
    ----------
    yobs :  array of positive and negative ``observed`` cases
    yhat : array of positive and negative ``predicted`` cases

    Returns
    -------
    sensitivity  = TP / (TP+FN)
    specificity  = TN / (TN+FP)
    pos_pred_val = TP/ (TP+FP)
    neg_pred_val = TN/ (TN+FN)

    Author: Julio Cardenas-Rodriguez
    """
    TP = np.sum(  yobs[yobs==1] == yhat[yobs==1] )
    TN = np.sum(  yobs[yobs==0] == yhat[yobs==0] )
    FP = np.sum(  yobs[yobs==1] == yhat[yobs==0] )
    FN = np.sum(  yobs[yobs==0] == yhat[yobs==1] )

    sensitivity  = TP / (TP+FN)
    specificity  = TN / (TN+FP)
    pos_pred_val = TP/ (TP+FP)
    neg_pred_val = TN/ (TN+FN)

    return sensitivity, specificity, pos_pred_val, neg_pred_val

Answer 10

回答by daniel.kaifeng

you can try sklearn.metrics.classification_reportas below:

你可以尝试sklearn.metrics.classification_report如下：

import sklearn
y_true = [1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0]
y_pred = [1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0]

print sklearn.metrics.classification_report(y_true, y_pred)

output:

输出：

         precision    recall  f1-score   support

      0       0.80      0.57      0.67         7
      1       0.50      0.75      0.60         4

      avg / total       0.69      0.64      0.64        11

Python Scikit-learn：如何获得真阳性、真阴性、假阳性和假阴性

提问by Euskalduna

采纳答案by invoketheshell

回答by Akshat Harit

回答by ykorkmaz

回答by gruangly

回答by enterbutton

回答by lucidv01d

回答by Joseloman

回答by andandandand

回答by Julio Cárdenas-Rodríguez

回答by daniel.kaifeng

相关推荐

最近更新

标签

Python Scikit-learn：如何获得真阳性、真阴性、假阳性和假阴性

提问by Euskalduna

采纳答案by invoketheshell

回答by Akshat Harit

回答by ykorkmaz

回答by gruangly

回答by enterbutton

回答by lucidv01d

回答by Joseloman

回答by andandandand

回答by Julio Cárdenas-Rodríguez

回答by daniel.kaifeng

相关推荐

python 3D可视化和图形

Python 从 Pandas Timedelta 获取总小时数？

Python 中的 Selenium Webdriver - Chrome 首选项中的文件下载目录更改

Python 是否有工具可以自动计算函数的 Big-O 复杂度

相关推荐

最近更新

标签