Python Logistic 回归中正则化强度的倒数是多少?它应该如何影响我的代码?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/22851316/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
What is the inverse of regularization strength in Logistic Regression? How should it affect my code?
提问by user3427495
I am using sklearn.linear_model.LogisticRegression
in scikit learn
to run a Logistic Regression.
我正在使用sklearn.linear_model.LogisticRegression
inscikit learn
运行逻辑回归。
C : float, optional (default=1.0) Inverse of regularization strength;
must be a positive float. Like in support vector machines, smaller
values specify stronger regularization.
What does C
mean here in simple terms please? What is regularization strength?
请问C
这里用简单的术语是什么意思?什么是正则化强度?
采纳答案by TooTone
Regularizationis applying a penalty to increasing the magnitude of parameter values in order to reduce overfitting. When you train a model such as a logistic regression model, you are choosing parameters that give you the best fit to the data. This means minimizing the error between what the model predicts for your dependent variable given your data compared to what your dependent variable actually is.
正则化是对增加参数值的大小施加惩罚,以减少过拟合。当您训练逻辑回归模型等模型时,您是在选择最适合数据的参数。这意味着最小化模型对给定数据的因变量预测值与因变量实际值之间的误差。
The problem comes when you have a lot of parameters (a lot of independent variables) but not too much data. In this case, the model will often tailor the parameter values to idiosyncrasies in your data -- which means it fits your data almost perfectly. However because those idiosyncrasies don't appear in future data you see, your model predicts poorly.
当您有很多参数(很多自变量)但没有太多数据时,问题就会出现。在这种情况下,模型通常会根据数据中的特性定制参数值——这意味着它几乎完美地适合您的数据。但是,由于这些特性不会出现在您看到的未来数据中,因此您的模型预测效果不佳。
To solve this, as well as minimizing the error as already discussed, you add to what is minimized and also minimize a function that penalizes large values of the parameters. Most often the function is λΣθj2, which is some constant λ times the sum of the squared parameter values θj2. The larger λ is the less likely it is that the parameters will be increased in magnitude simply to adjust for small perturbations in the data. In your case however, rather than specifying λ, you specify C=1/λ.
为了解决这个问题,以及将已经讨论过的误差最小化,您可以添加最小化的内容并最小化一个惩罚大参数值的函数。最常见的函数是 λΣθ j 2,它是某个常数 λ 乘以平方参数值 θ j 2的总和。λ 越大,参数增加幅度的可能性就越小,只是为了调整数据中的小扰动。但是,在您的情况下,您不是指定 λ,而是指定 C=1/λ。