Python Scikit-learn:如何获得真阳性、真阴性、假阳性和假阴性
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/31324218/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Scikit-learn: How to obtain True Positive, True Negative, False Positive and False Negative
提问by Euskalduna
My problem:
我的问题:
I have a dataset which is a large JSON file. I read it and store it in the trainList
variable.
我有一个数据集,它是一个大型 JSON 文件。我读取它并将其存储在trainList
变量中。
Next, I pre-process it - in order to be able to work with it.
接下来,我对其进行预处理 - 为了能够使用它。
Once I have done that I start the classification:
完成后,我开始分类:
- I use the
kfold
cross validation method in order to obtain the mean accuracy and train a classifier. - I make the predictions and obtain the accuracy & confusion matrix of that fold.
- After this, I would like to obtain the
True Positive(TP)
,True Negative(TN)
,False Positive(FP)
andFalse Negative(FN)
values. I'll use these parameters to obtain the Sensitivityand Specificity.
- 我使用
kfold
交叉验证方法来获得平均准确度并训练分类器。 - 我做出预测并获得该折叠的准确性和混淆矩阵。
- 在此之后,我想获得的
True Positive(TP)
,True Negative(TN)
,False Positive(FP)
和False Negative(FN)
值。我将使用这些参数来获得Sensitivity和Specificity。
Finally, I would use this to put in HTML in order to show a chart with the TPs of each label.
最后,我将使用它来放入 HTML 以显示带有每个标签的 TP 的图表。
Code:
代码:
The variables I have for the moment:
我目前拥有的变量:
trainList #It is a list with all the data of my dataset in JSON form
labelList #It is a list with all the labels of my data
Most part of the method:
大部分方法:
#I transform the data from JSON form to a numerical one
X=vec.fit_transform(trainList)
#I scale the matrix (don't know why but without it, it makes an error)
X=preprocessing.scale(X.toarray())
#I generate a KFold in order to make cross validation
kf = KFold(len(X), n_folds=10, indices=True, shuffle=True, random_state=1)
#I start the cross validation
for train_indices, test_indices in kf:
X_train=[X[ii] for ii in train_indices]
X_test=[X[ii] for ii in test_indices]
y_train=[listaLabels[ii] for ii in train_indices]
y_test=[listaLabels[ii] for ii in test_indices]
#I train the classifier
trained=qda.fit(X_train,y_train)
#I make the predictions
predicted=qda.predict(X_test)
#I obtain the accuracy of this fold
ac=accuracy_score(predicted,y_test)
#I obtain the confusion matrix
cm=confusion_matrix(y_test, predicted)
#I should calculate the TP,TN, FP and FN
#I don't know how to continue
采纳答案by invoketheshell
If you have two lists that have the predicted and actual values; as it appears you do, you can pass them to a function that will calculate TP, FP, TN, FN with something like this:
如果您有两个具有预测值和实际值的列表;正如您所做的那样,您可以将它们传递给一个函数,该函数将使用以下内容计算 TP、FP、TN、FN:
def perf_measure(y_actual, y_hat):
TP = 0
FP = 0
TN = 0
FN = 0
for i in range(len(y_hat)):
if y_actual[i]==y_hat[i]==1:
TP += 1
if y_hat[i]==1 and y_actual[i]!=y_hat[i]:
FP += 1
if y_actual[i]==y_hat[i]==0:
TN += 1
if y_hat[i]==0 and y_actual[i]!=y_hat[i]:
FN += 1
return(TP, FP, TN, FN)
From here I think you will be able to calculate rates of interest to you, and other performance measure like specificity and sensitivity.
从这里我认为您将能够计算出您的利率,以及其他性能指标,如特异性和敏感性。
回答by Akshat Harit
You can obtain all of the parameters from the confusion matrix. The structure of the confusion matrix(which is 2X2 matrix) is as follows (assuming the first index is related to the positive label, and the rows are related to the true labels):
您可以从混淆矩阵中获取所有参数。混淆矩阵(2X2矩阵)的结构如下(假设第一个索引与正标签相关,行与真实标签相关):
TP|FN
FP|TN
So
所以
TP = cm[0][0]
FN = cm[0][1]
FP = cm[1][0]
TN = cm[1][1]
More details at https://en.wikipedia.org/wiki/Confusion_matrix
回答by ykorkmaz
I think both of the answers are not fully correct. For example, suppose that we have the following arrays;
y_actual = [1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0]
我认为这两个答案都不完全正确。例如,假设我们有以下数组;
y_actual = [1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0]
y_predic = [1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0]
y_predic = [1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0]
If we compute the FP, FN, TP and TN values manually, they should be as follows:
如果我们手动计算 FP、FN、TP 和 TN 值,它们应该如下:
FP: 3 FN: 1 TP: 3 TN: 4
FP:3 FN:1 TP:3 TN:4
However, if we use the first answer, results are given as follows:
但是,如果我们使用第一个答案,结果如下:
FP: 1 FN: 3 TP: 3 TN: 4
FP:1 FN:3 TP:3 TN:4
They are not correct, because in the first answer, False Positive should be where actual is 0, but the predicted is 1, not the opposite. It is also same for False Negative.
它们不正确,因为在第一个答案中,False Positive 应该是实际为 0,但预测为 1,而不是相反。对于假阴性也是一样的。
And, if we use the second answer, the results are computed as follows:
而且,如果我们使用第二个答案,结果计算如下:
FP: 3 FN: 1 TP: 4 TN: 3
FP:3 FN:1 TP:4 TN:3
True Positive and True Negative numbers are not correct, they should be opposite.
真正数和真负数是不正确的,它们应该是相反的。
Am I correct with my computations? Please let me know if I am missing something.
我的计算正确吗?如果我遗漏了什么,请告诉我。
回答by gruangly
According to scikit-learn documentation,
根据 scikit-learn 文档,
By definition a confusion matrix C is such that C[i, j]
is equal to the number of observations known to be in group i
but predicted to be in group j
.
根据定义,混淆矩阵 CC[i, j]
等于已知在 group 中i
但预测在 group 中的观察数j
。
Thus in binary classification, the count of true negatives is C[0,0]
, false negatives is C[1,0]
, true positives is C[1,1]
and false positives is C[0,1]
.
因此,在二元分类中,真负数为C[0,0]
,假负数为C[1,0]
,真正数为C[1,1]
,假正数为C[0,1]
。
CM = confusion_matrix(y_true, y_pred)
TN = CM[0][0]
FN = CM[1][0]
TP = CM[1][1]
FP = CM[0][1]
回答by enterbutton
if you have more than one classes in your classifier, you might want to use pandas-ml at that part. Confusion Matrix of pandas-ml give more detailed information. check that
如果您的分类器中有多个类,您可能希望在该部分使用 pandas-ml。pandas-ml 的混淆矩阵提供了更详细的信息。检查一下
回答by lucidv01d
For the multi-class case, everything you need can be found from the confusion matrix. For example, if your confusion matrix looks like this:
对于多类情况,您需要的一切都可以从混淆矩阵中找到。例如,如果您的混淆矩阵如下所示:
Then what you're looking for, per class, can be found like this:
然后,您可以在每个班级中找到您要查找的内容,如下所示:
Using pandas/numpy, you can do this for all classes at once like so:
使用 pandas/numpy,您可以一次为所有类执行此操作,如下所示:
FP = confusion_matrix.sum(axis=0) - np.diag(confusion_matrix)
FN = confusion_matrix.sum(axis=1) - np.diag(confusion_matrix)
TP = np.diag(confusion_matrix)
TN = confusion_matrix.values.sum() - (FP + FN + TP)
# Sensitivity, hit rate, recall, or true positive rate
TPR = TP/(TP+FN)
# Specificity or true negative rate
TNR = TN/(TN+FP)
# Precision or positive predictive value
PPV = TP/(TP+FP)
# Negative predictive value
NPV = TN/(TN+FN)
# Fall out or false positive rate
FPR = FP/(FP+TN)
# False negative rate
FNR = FN/(TP+FN)
# False discovery rate
FDR = FP/(TP+FP)
# Overall accuracy
ACC = (TP+TN)/(TP+FP+FN+TN)
回答by Joseloman
In the scikit-learn 'metrics' library there is a confusion_matrix method which gives you the desired output.
在 scikit-learn 'metrics' 库中,有一个 Confusion_matrix 方法可以为您提供所需的输出。
You can use any classifier that you want. Here I used the KNeighbors as example.
您可以使用任何您想要的分类器。这里我以 KNeighbors 为例。
from sklearn import metrics, neighbors
clf = neighbors.KNeighborsClassifier()
X_test = ...
y_test = ...
expected = y_test
predicted = clf.predict(X_test)
conf_matrix = metrics.confusion_matrix(expected, predicted)
>>> print conf_matrix
>>> [[1403 87]
[ 56 3159]]
文档:http: //scikit-learn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html#sklearn.metrics.confusion_matrix
回答by andandandand
Here's a fix to invoketheshell's buggy code (which currently appears as the accepted answer):
这是调用theshell错误代码的修复程序(当前显示为已接受的答案):
def performance_measure(y_actual, y_hat):
TP = 0
FP = 0
TN = 0
FN = 0
for i in range(len(y_hat)):
if y_actual[i] == y_hat[i]==1:
TP += 1
if y_hat[i] == 1 and y_actual[i] == 0:
FP += 1
if y_hat[i] == y_actual[i] == 0:
TN +=1
if y_hat[i] == 0 and y_actual[i] == 1:
FN +=1
return(TP, FP, TN, FN)
回答by Julio Cárdenas-Rodríguez
I wrote a version that works using only numpy. I hope it helps you.
我写了一个只使用 numpy 的版本。我希望它能帮助你。
import numpy as np
def perf_metrics_2X2(yobs, yhat):
"""
Returns the specificity, sensitivity, positive predictive value, and
negative predictive value
of a 2X2 table.
where:
0 = negative case
1 = positive case
Parameters
----------
yobs : array of positive and negative ``observed`` cases
yhat : array of positive and negative ``predicted`` cases
Returns
-------
sensitivity = TP / (TP+FN)
specificity = TN / (TN+FP)
pos_pred_val = TP/ (TP+FP)
neg_pred_val = TN/ (TN+FN)
Author: Julio Cardenas-Rodriguez
"""
TP = np.sum( yobs[yobs==1] == yhat[yobs==1] )
TN = np.sum( yobs[yobs==0] == yhat[yobs==0] )
FP = np.sum( yobs[yobs==1] == yhat[yobs==0] )
FN = np.sum( yobs[yobs==0] == yhat[yobs==1] )
sensitivity = TP / (TP+FN)
specificity = TN / (TN+FP)
pos_pred_val = TP/ (TP+FP)
neg_pred_val = TN/ (TN+FN)
return sensitivity, specificity, pos_pred_val, neg_pred_val
回答by daniel.kaifeng
you can try sklearn.metrics.classification_report
as below:
你可以尝试sklearn.metrics.classification_report
如下:
import sklearn
y_true = [1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0]
y_pred = [1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0]
print sklearn.metrics.classification_report(y_true, y_pred)
output:
输出:
precision recall f1-score support
0 0.80 0.57 0.67 7
1 0.50 0.75 0.60 4
avg / total 0.69 0.64 0.64 11