Python sklearn.svm.svc 的函数 predict_proba() 在内部如何工作?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/15111408/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 13:21:41  来源:igfitidea点击:

How does sklearn.svm.svc's function predict_proba() work internally?

pythonsvmscikit-learn

提问by user2115183

I am using sklearn.svm.svcfrom scikit-learnto do binary classification. I am using its predict_proba() function to get probability estimates. Can anyone tell me how predict_proba() internally calculates the probability?

我使用sklearn.svm.svcscikit学习做二元分类。我正在使用它的 predict_proba() 函数来获得概率估计。谁能告诉我 predict_proba() 内部如何计算概率?

采纳答案by Fred Foo

Scikit-learn uses LibSVM internally, and this in turn uses Platt scaling, as detailed in this note by the LibSVM authors, to calibrate the SVM to produce probabilities in addition to class predictions.

Scikit-learn 在内部使用 LibSVM,这反过来又使用Platt scaling,正如LibSVM 作者这篇笔记中所详述的,校准 SVM 以产生除类别预测之外的概率。

Platt scaling requires first training the SVM as usual, then optimizing parameter vectors Aand Bsuch that

Platt 缩放需要首先像往常一样训练 SVM,然后优化参数向量AB,使得

P(y|X) = 1 / (1 + exp(A * f(X) + B))

where f(X)is the signed distance of a sample from the hyperplane (scikit-learn's decision_functionmethod). You may recognize the logistic sigmoidin this definition, the same function that logistic regression and neural nets use for turning decision functions into probability estimates.

其中f(X)是样本与超平面的有符号距离(scikit-learn 的decision_function方法)。您可能会在此定义中识别出逻辑 sigmoid,它与逻辑回归和神经网络用于将决策函数转换为概率估计的函数相同。

Mind you: the Bparameter, the "intercept" or "bias" or whatever you like to call it, can cause predictions based on probability estimates from this model to be inconsistent with the ones you get from the SVM decision function f. E.g. suppose that f(X) = 10, then the prediction for Xis positive; but if B = -9.9and A = 1, then P(y|X) = .475. I'm pulling these numbers out of thin air, but you've noticed that this can occur in practice.

请注意:B参数、“截距”或“偏差”或任何您喜欢称其为的,可能会导致基于此模型概率估计的预测与您从 SVM 决策函数获得的预测不一致f。例如,假设f(X) = 10,则预测为X正;但是如果B = -9.9并且A = 1,那么P(y|X) = .475。我是凭空得出这些数字的,但您已经注意到这可能在实践中发生。

Effectively, Platt scaling trains a probability model on top of the SVM's outputs under a cross-entropy loss function. To prevent this model from overfitting, it uses an internal five-fold cross validation, meaning that training SVMs with probability=Truecan be quite a lot more expensive than a vanilla, non-probabilistic SVM.

实际上,Platt 缩放在交叉熵损失函数下在 SVM 的输出之上训练概率模型。为了防止这个模型过度拟合,它使用了内部五重交叉验证,这意味着训练 SVMprobability=True可能比普通的非概率 SVM 昂贵得多。

回答by user1165814

Actually I found a slightly different answer that they used this code to convert decision value to probability

实际上我发现了一个稍微不同的答案,他们使用此代码将决策值转换为概率

'double fApB = decision_value*A+B;
if (fApB >= 0)
    return Math.exp(-fApB)/(1.0+Math.exp(-fApB));
else
     return 1.0/(1+Math.exp(fApB)) ;'

Here A and B values can be found in the model file (probA and probB). It offers a way to convert probability to decision value and thus to hinge loss.

这里 A 和 B 值可以在模型文件(probA 和 probB)中找到。它提供了一种将概率转换为决策值并因此转换为铰链损失的方法。

Use that ln(0) = -200.

使用 ln(0) = -200。