将 LinearSVC 的决策函数转换为概率（Scikit learn python）

Question

提问by chet

I use linear SVM from scikit learn (LinearSVC) for binary classification problem. I understand that LinearSVC can give me the predicted labels, and the decision scores but I wanted probability estimates (confidence in the label). I want to continue using LinearSVC because of speed (as compared to sklearn.svm.SVC with linear kernel) Is it reasonable to use a logistic function to convert the decision scores to probabilities?

我使用来自 scikit learn (LinearSVC) 的线性 SVM 来解决二元分类问题。我知道 LinearSVC 可以给我预测的标签和决策分数，但我想要概率估计（对标签的信心）。由于速度的原因，我想继续使用 LinearSVC（与具有线性内核的 sklearn.svm.SVC 相比）使用逻辑函数将决策分数转换为概率是否合理？

import sklearn.svm as suppmach
# Fit model:
svmmodel=suppmach.LinearSVC(penalty='l1',C=1)
predicted_test= svmmodel.predict(x_test)
predicted_test_scores= svmmodel.decision_function(x_test)

I want to check if it makes sense to obtain Probability estimates simply as [1 / (1 + exp(-x)) ] where x is the decision score.

我想检查简单地获得概率估计是否有意义 [1 / (1 + exp(-x)) ] 其中 x 是决策分数。

Alternately, are there other options wrt classifiers that I can use to do this efficiently?

或者，是否还有其他选项可以使用分类器来有效地执行此操作？

Thanks.

谢谢。

Answer 1

采纳答案by greeness

I took a look at the apis in sklearn.svm.* family. All below models, e.g.,

我查看了 sklearn.svm.* 系列中的 api。所有以下型号，例如，

sklearn.svm.SVC
sklearn.svm.NuSVC
sklearn.svm.SVR
sklearn.svm.NuSVR

sklearn.svm.SVC
sklearn.svm.NuSVC
sklearn.svm.SVR
sklearn.svm.NuSVR

have a common interfacethat supplies a

有一个公共接口，提供一个

probability: boolean, optional (default=False)

parameter to the model. If this parameter is set to True, libsvm will train a probability transformation model on top of the SVM's outputs based on idea of Platt Scaling. The form of transformation is similar to a logistic function as you pointed out, however two specific constants Aand Bare learned in a post-processing step. Also see this stackoverflowpost for more details.

模型的参数。如果此参数设置为 True，libsvm 将基于Platt Scaling 的思想在 SVM 的输出之上训练概率转换模型。转换的形式类似于逻辑函数正如你所指出，但两个特定的常量A和B在后处理步骤中的经验教训。另请参阅此stackoverflow帖子以获取更多详细信息。

enter image description here

在此处输入图片说明

I actually don't know why this post-processing is not available for LinearSVC. Otherwise, you would just call predict_proba(X)to get the probability estimate.

我实际上不知道为什么这种后处理不适用于 LinearSVC。否则，您只需调用predict_proba(X)即可获得概率估计值。

Of course, if you just apply a naive logistic transform, it will not perform as well as a calibrated approach like Platt Scaling. If you can understand the underline algorithm of platt scaling, probably you can write your own or contribute to the scikit-learn svm family. :) Also feel free to use the above four SVM variations that support predict_proba.

当然，如果您只是应用简单的逻辑变换，它的性能将不如Platt Scaling等校准方法。如果您能理解 platt 缩放的下划线算法，您可能可以自己编写或为 scikit-learn svm 系列做出贡献。:) 也可以随意使用上述四种支持predict_proba.

Answer 2

回答by Fred Foo

If you want speed, then just replacethe SVM with sklearn.linear_model.LogisticRegression. That uses the exact same training algorithm as LinearSVC, but with log-loss instead of hinge loss.

如果您想要速度，那么只需将 SVM替换为sklearn.linear_model.LogisticRegression. 它使用与完全相同的训练算法LinearSVC，但使用对数损失而不是铰链损失。

Using [1 / (1 + exp(-x))] will produce probabilities, in a formal sense (numbers between zero and one), but they won't adhere to any justifiable probability model.

使用 [1 / (1 + exp(-x))] 将产生正式意义上的概率（0 和 1 之间的数字），但它们不会遵守任何合理的概率模型。

Answer 3

回答by Mikhail Korobov

scikit-learn provides CalibratedClassifierCVwhich can be used to solve this problem: it allows to add probability output to LinearSVC or any other classifier which implements decision_function method:

scikit-learn 提供了CalibratedClassifierCV可以用来解决这个问题：它允许将概率输出添加到 LinearSVC 或任何其他实现了 decision_function 方法的分类器：

 svm = LinearSVC()
 clf = CalibratedClassifierCV(svm) 
 clf.fit(X_train, y_train)
 y_proba = clf.predict_proba(X_test)

User guide has a nice sectionon that. By default CalibratedClassifierCV+LinearSVC will get you Platt scaling, but it also provides other options (isotonic regression method), and it is not limited to SVM classifiers.

用户指南有一个很好的部分。默认情况下，CalibratedClassifierCV+LinearSVC 将为您提供 Platt 缩放，但它还提供其他选项（等渗回归方法），并且不限于 SVM 分类器。

Answer 4

回答by Syncrossus

If what your really want is a measure of confidence rather than actual probabilities, you can use the method LinearSVC.decision_function(). See the documentation.

如果您真正想要的是置信度而不是实际概率，则可以使用该方法LinearSVC.decision_function()。请参阅文档。

将 LinearSVC 的决策函数转换为概率（Scikit learn python）

提问by chet

采纳答案by greeness

回答by Fred Foo

回答by Mikhail Korobov

回答by Syncrossus

相关推荐

最近更新

标签

将 LinearSVC 的决策函数转换为概率（Scikit learn python）

提问by chet

采纳答案by greeness

回答by Fred Foo

回答by Mikhail Korobov

回答by Syncrossus

相关推荐

Python 如何从安卓平板电脑访问我的 127.0.0.1:8000

Python 如何在一个带有美丽汤的div中选择一类div？

Python - 四舍五入到最接近的十

Python pandas：如何将一列中的文本拆分为多行？

相关推荐

最近更新

标签