pandas 如何在python中运行非线性回归

Question

提问by Mukul

i am having the following information(dataframe) in python

我在python中有以下信息（数据框）

product baskets scaling_factor
12345   475     95.5
12345   108     57.7
12345   2       1.4
12345   38      21.9
12345   320     88.8

and I want to run the following non-linear regressionand estimate the parameters.

我想运行以下非线性回归并估计参数。

a ,b and c

a,b 和 c

Equation that i want to fit:

我想拟合的方程：

scaling_factor = a - (b*np.exp(c*baskets))

In sas we usually run the following model:(uses gauss newton method )

在sas中我们通常运行以下模型：（使用高斯牛顿法）

proc nlin data=scaling_factors;
 parms a=100 b=100 c=-0.09;
 model scaling_factor = a - (b * (exp(c*baskets)));
 output out=scaling_equation_parms 
parms=a b c;

is there a similar way to estimate the parameters in Python using non linear regression, how can i see the plot in python.

是否有类似的方法可以使用非线性回归来估计 Python 中的参数，我如何才能在 Python 中看到该图。

Answer 1

采纳答案by mikuszefski

Agreeing with Chris Mueller, I'd also use scipybut scipy.optimize.curve_fit. The code looks like:

同意 Chris Mueller，我也会使用 scipybut scipy.optimize.curve_fit。代码如下：

###the top two lines are required on my linux machine
import matplotlib
matplotlib.use('Qt4Agg')
import matplotlib.pyplot as plt
from matplotlib.pyplot import cm
import numpy as np
from scipy.optimize import curve_fit #we could import more, but this is what we need
###defining your fitfunction
def func(x, a, b, c):
    return a - b* np.exp(c * x) 
###OP's data
baskets = np.array([475, 108, 2, 38, 320])
scaling_factor = np.array([95.5, 57.7, 1.4, 21.9, 88.8])
###let us guess some start values
initialGuess=[100, 100,-.01]
guessedFactors=[func(x,*initialGuess ) for x in baskets]
###making the actual fit
popt,pcov = curve_fit(func, baskets, scaling_factor,initialGuess)
#one may want to
print popt
print pcov
###preparing data for showing the fit
basketCont=np.linspace(min(baskets),max(baskets),50)
fittedData=[func(x, *popt) for x in basketCont]
###preparing the figure
fig1 = plt.figure(1)
ax=fig1.add_subplot(1,1,1)
###the three sets of data to plot
ax.plot(baskets,scaling_factor,linestyle='',marker='o', color='r',label="data")
ax.plot(baskets,guessedFactors,linestyle='',marker='^', color='b',label="initial guess")
ax.plot(basketCont,fittedData,linestyle='-', color='#900000',label="fit with ({0:0.2g},{1:0.2g},{2:0.2g})".format(*popt))
###beautification
ax.legend(loc=0, title="graphs", fontsize=12)
ax.set_ylabel("factor")
ax.set_xlabel("baskets")
ax.grid()
ax.set_title("$\mathrm{curve}_\mathrm{fit}$")
###putting the covariance matrix nicely
tab= [['{:.2g}'.format(j) for j in i] for i in pcov]
the_table = plt.table(cellText=tab,
                  colWidths = [0.2]*3,
                  loc='upper right', bbox=[0.483, 0.35, 0.5, 0.25] )
plt.text(250,65,'covariance:',size=12)
###putting the plot
plt.show()
###done

Eventually, giving you:

最后，给你：

Answer 2

回答by Chris Mueller

For problems like these I always use scipy.optimize.minimizewith my own least squares function. The optimization algorithms don't handle large differences between the various inputs well, so it is a good idea to scale the parameters in your function so that the parameters exposed to scipy are all on the order of 1 as I've done below.

对于这些问题，我总是使用scipy.optimize.minimize我自己的最小二乘函数。优化算法不能很好地处理各种输入之间的巨大差异，因此在函数中缩放参数是一个好主意，这样暴露给 scipy 的参数都在 1 的数量级上，正如我在下面所做的那样。

import numpy as np

baskets = np.array([475, 108, 2, 38, 320])
scaling_factor = np.array([95.5, 57.7, 1.4, 21.9, 88.8])

def lsq(arg):
    a = arg[0]*100
    b = arg[1]*100
    c = arg[2]*0.1
    now = a - (b*np.exp(c * baskets)) - scaling_factor
    return np.sum(now**2)

guesses = [1, 1, -0.9]
res = scipy.optimize.minimize(lsq, guesses)

print(res.message)
# 'Optimization terminated successfully.'

print(res.x)
# [ 0.97336709  0.98685365 -0.07998282]

print([lsq(guesses), lsq(res.x)])
# [7761.0093358076601, 13.055053196410928]

Of course, as with all minimization problems it is important to use good initial guesses since all of the algorithms can get trapped in a local minimum. The optimization method can be changed by using the methodkeyword; some of the possibilities are

当然，与所有最小化问题一样，使用良好的初始猜测很重要，因为所有算法都可能陷入局部最小值。可以使用method关键字更改优化方法；一些可能性是

‘Nelder-Mead'
‘Powell'
‘CG'
‘BFGS'
‘Newton-CG'

'内尔德-米德'
'鲍威尔'
'CG'
'BFGS'
'牛顿-CG'

The default is BFGS according to the documentation.

根据文档，默认值为 BFGS 。

pandas 如何在python中运行非线性回归

提问by Mukul

采纳答案by mikuszefski

回答by Chris Mueller

相关推荐

最近更新

标签

pandas 如何在python中运行非线性回归

提问by Mukul

采纳答案by mikuszefski

回答by Chris Mueller

相关推荐

Pandas 交叉表，但包含来自第三列聚合的值

pandas 使用熊猫滚动法计算加权移动平均线

pandas 类型错误：预期序列或类似数组，得到了估计器

pandas 在 0 和 1 之间标准化忽略 NaN

相关推荐

最近更新

标签