Python LogisticRegression.predict_proba 的 scikit-learn 返回值

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/36681449/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 18:13:24  来源:igfitidea点击:

scikit-learn return value of LogisticRegression.predict_proba

pythonmachine-learningscikit-learnprobabilitylogistic-regression

提问by Zelphir Kaltstahl

What exactly does the LogisticRegression.predict_probafunction return?

LogisticRegression.predict_proba函数究竟返回什么?

In my example I get a result like this:

在我的示例中,我得到如下结果:

[[  4.65761066e-03   9.95342389e-01]
 [  9.75851270e-01   2.41487300e-02]
 [  9.99983374e-01   1.66258341e-05]]

From other calculations, using the sigmoid function, I know, that the second column are probabilities. The documentationsays, that the first column are n_samples, but that can't be, because my samples are reviews, which are texts and not numbers. The documentation also says, that the second column are n_classes. That certainly can't be, since I only have two classes (namely +1and -1) and the function is supposed to be about calculating probabilities of samples really being of a class, but not the classes themselves.

从其他计算中,使用 sigmoid 函数,我知道第二列是概率。该文件说,第一列是n_samples,但那是不可能的,因为我的样品的评价,这是文字和数字没有。文档还说,第二列是n_classes. 那当然不可能,因为我只有两个类(即+1-1)并且该函数应该是关于计算样本真正属于一个类的概率,而不是类本身。

What is the first column really and why it is there?

真正的第一列是什么,为什么会出现在那里?

回答by iulian

4.65761066e-03 + 9.95342389e-01 = 1
9.75851270e-01 + 2.41487300e-02 = 1
9.99983374e-01 + 1.66258341e-05 = 1

The first column is the probability that the entry has the -1label and the second column is the probability that the entry has the +1label.

第一列是条目具有-1标签的概率,第二列是条目具有+1标签的概率。

If you would like to get the predicted probabilities for the positive label only, you can use logistic_model.predict_proba(data)[:,1]. This will yield you the [9.95342389e-01, 2.41487300e-02, 1.66258341e-05]result.

如果您只想获得正标签的预测概率,您可以使用logistic_model.predict_proba(data)[:,1]. 这将为您带来[9.95342389e-01, 2.41487300e-02, 1.66258341e-05]结果。