Python ConvergenceWarning:Liblinear 收敛失败,增加迭代次数

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/52670012/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 20:09:10  来源:igfitidea点击:

ConvergenceWarning: Liblinear failed to converge, increase the number of iterations

pythonopencvlbph-algorithm

提问by Fahad Mairaj

Running the code of linear binary pattern for Adrian. This program runs but gives the following warning:

为 Adrian 运行线性二进制模式的代码。该程序运行但给出以下警告:

C:\Python27\lib\site-packages\sklearn\svm\base.py:922: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
 "the number of iterations.", ConvergenceWarning

I am running python2.7 with opencv3.7, what should I do?

我正在使用 opencv3.7 运行 python2.7,我该怎么办?

回答by lightalchemist

Normally when an optimization algorithm does not converge, it is usually because the problem is not well-conditioned, perhaps due to a poor scaling of the decision variables. There are a few things you can try.

通常,当优化算法不收敛时,通常是因为问题的条件不好,可能是由于决策变量的缩放不当。您可以尝试一些方法。

  1. Normalize your training data so that the problem hopefully becomes more well conditioned, which in turn can speed up convergence. One possibility is to scale your data to 0 mean, unit standard deviation using Scikit-Learn's StandardScalerfor an example. Note that you have to apply the StandardScaler fitted on the training data to the test data.
  2. Related to 1), make sure the other arguments such as regularization weight, C, is set appropriately.
  3. Set max_iterto a larger value. The default is 1000.
  4. Set dual = falseif number of features > number of examples and vice versa. This solves the SVM optimization problem using the dual formulation. Thanks @Nino van Hoofffor pointing this out.
  5. Use a different solver, for e.g., the L-BFGS solver if you are using Logistic Regression. See @5ervant's answer.
  1. 规范化您的训练数据,以便问题有望变得更好,从而可以加快收敛速度​​。一种可能性是使用Scikit-Learn 的 StandardScaler示例将数据缩放到 0 均值、单位标准偏差 。请注意,您必须将在训练数据上拟合的 StandardScaler 应用于测试数据。
  2. 与 1) 相关,确保C适当设置其他参数,例如正则化权重。
  3. 设置max_iter为更大的值。默认值为 1000。
  4. 设置dual = false特征数量 > 示例数量,反之亦然。这使用对偶公式解决了 SVM 优化问题。感谢@Nino van Hooff指出这一点。
  5. 如果您使用的是 Logistic 回归,请使用不同的求解器,例如 L-BFGS 求解器。请参阅@5ervant的回答。

Note: One should not ignore this warning.

注意:不应忽略此警告。

This warning came about because

这个警告是因为

  1. Solving the linear SVM is just solving a quadratic optimization problem. The solver is typically an iterative algorithm that keeps a running estimate of the solution (i.e., the weight and bias for the SVM). It stops running when the solution corresponds to an objective value that is optimal for this convex optimization problem, or when it hits the maximum number of iterations set.

  2. If the algorithm does not converge, then the current estimate of the SVM's parameters are not guaranteed to be any good, hence the predictions can also be complete garbage.

  1. 求解线性 SVM 只是求解二次优化问题。求解器通常是一种迭代算法,它保持对解的运行估计(即 SVM 的权重和偏差)。当解对应于该凸优化问题的最佳目标值时,或者当它达到最大迭代次数集时,它就会停止运行。

  2. 如果算法不收敛,则不能保证 SVM 参数的当前估计是好的,因此预测也可能是完全垃圾。

Edit

编辑

In addition, consider the comment by @Nino van Hooffand @5ervantto use the dual formulation of the SVM. This is especially important if the number of features you have, D, is more than the number of training examples N. This is what the dual formulation of the SVM is particular designed for and helps with the conditioning of the optimization problem. Credit to @5ervantfor noticing and pointing this out.

此外,请考虑@Nino van Hooff@5ervant的评论,以使用 SVM 的双重公式。如果您拥有的特征数量 D 大于训练示例的数量 N,这一点尤其重要。这就是 SVM 的对偶公式是专门为优化问题的条件而设计的。感谢@ 5ervant用于察觉并指出了这一点。

Furthermore, @5ervantalso pointed out the possibility of changing the solver, in particular the use of the L-BFGS solver. Credit to him (i.e., upvote his answer, not mine).

此外,@5ervant还指出了更改求解器的可能性,尤其是使用 L-BFGS 求解器。归功于他(即,赞成他的回答,而不是我的)。

I would like to provide a quick rough explanation for those who are interested (I am :)) why this matters in this case. Second-order methods, and in particular approximate second-order method like the L-BFGS solver, will help with ill-conditioned problems because it is approximating the Hessian at each iteration and using it to scale the gradient direction. This allows it to get better convergence ratebut possibly at a higher compute cost per iteration. That is, it takes fewer iterations to finish but each iteration will be slower than a typical first-order method like gradient-descent or its variants.

我想为那些感兴趣的人提供一个快速粗略的解释(我是:))为什么这在这种情况下很重要。二阶方法,特别是像 L-BFGS 求解器这样的近似二阶方法,将有助于解决病态问题,因为它在每次迭代时逼近 Hessian 并使用它来缩放梯度方向。这允许它获得更好的收敛速度,但每次迭代的计算成本可能更高。也就是说,完成所需的迭代次数更少,但每次迭代都会比典型的一阶方法(如梯度下降或其变体)慢。

For e.g., a typical first-order method might update the solution at each iteration like

例如,典型的一阶方法可能会在每次迭代时更新解决方案,例如

x(k + 1) = x(k) - alpha(k) * gradient(f(x(k)))

x(k + 1) = x(k) - alpha(k) * 梯度(f(x(k)))

where alpha(k), the step size at iteration k, depends on the particular choice of algorithm or learning rate schedule.

其中 alpha(k),迭代 k 的步长,取决于算法或学习率计划的特定选择。

A second order method, for e.g., Newton, will have an update equation

二阶方法,例如牛顿,将有一个更新方程

x(k + 1) = x(k) - alpha(k) * Hessian(x(k))^(-1) * gradient(f(x(k)))

x(k + 1) = x(k) - alpha(k) * Hessian(x(k))^(-1) * gradient(f(x(k)))

That is, it uses the information of the local curvature encoded in the Hessian to scale the gradient accordingly. If the problem is ill-conditioned, the gradient will be pointing in less than ideal directions and the inverse Hessian scaling will help correct this.

也就是说,它使用 Hessian 中编码的局部曲率信息来相应地缩放梯度。如果问题是病态的,梯度将指向不太理想的方向,而反向 Hessian 缩放将有助于纠正这一点。

In particular, L-BFGS mentioned in @5ervant's answer is a way to approximate the inverse of the Hessian as computing it can be an expensive operation.

特别是,@5ervant的答案中提到的 L-BFGS是一种近似 Hessian 逆的方法,因为计算它可能是一项昂贵的操作。

However, second-order methods might converge much faster (i.e., requires fewer iterations) than first-order methods like the usual gradient-descent based solvers, which as you guys know by now sometimes fail to even converge. This can compensate for the time spent at each iteration.

然而,二阶方法的收敛速度可能比一阶方法快得多(即需要更少的迭代),比如通常的基于梯度下降的求解器,正如你们现在所知,有时甚至无法收敛。这可以补偿每次迭代所花费的时间。

In summary, if you have a well-conditioned problem, or if you can make it well-conditioned through other means such as using regularization and/or feature scaling and/or making sure you have more examples than features, you probably don't have to use a second-order method. But these days with many models optimizing non-convex problems (e.g., those in DL models), second order methods such as L-BFGS methods plays a different role there and there are evidence to suggest they can sometimes find better solutions compared to first-order methods. But that is another story.

总而言之,如果您有一个条件良好的问题,或者如果您可以通过其他方式(例如使用正则化和/或特征缩放和/或确保示例多于特征)使其条件良好,您可能不会必须使用二阶方法。但是如今,随着许多模型优化非凸问题(例如,DL 模型中的模型),L-BFGS 方法等二阶方法在那里发挥了不同的作用,并且有证据表明,与一阶方法相比,它们有时可以找到更好的解决方案订购方法。不过那是另一回事了。

回答by 5ervant

I reached the point that I set, up to max_iter=1200000on my LinearSVCclassifier, but still the "ConvergenceWarning"was still present. I fix the issue by just setting dual=Falseand leaving max_iterto its default.

我达到了我设置的点,直到max_iter=1200000我的LinearSVC分类器,但仍然存在“ConvergenceWarning”。我通过设置dual=False并保留max_iter默认值来解决这个问题。

With LogisticRegression(solver='lbfgs')classifier, you should increase max_iter. Mine have reached max_iter=7600before the "ConvergenceWarning"disappears when training with large dataset's features.

使用LogisticRegression(solver='lbfgs')分类器,您应该增加max_iter. 在使用大型数据集的特征进行训练时,我的已经max_iter=7600“ConvergenceWarning”消失之前达到了。