使用 scipy、numpy、python 等进行 sigmoidal 回归
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/4308168/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
sigmoidal regression with scipy, numpy, python, etc
提问by MedicalMath
I have two variables (x and y) that have a somewhat sigmoidal relationship with each other, and I need to find some sort of prediction equation that will enable me to predict the value of y, given any value of x. My prediction equation needs to show the somewhat sigmoidal relationship between the two variables. Therefore, I cannot settle for a linear regression equation that produces a line. I need to see the gradual, curvilinear change in slope that occurs at both the right and left of the graph of the two variables.
我有两个变量(x 和 y),它们之间存在某种 sigmoidal 关系,我需要找到某种预测方程,使我能够在给定 x 的任何值的情况下预测 y 的值。我的预测方程需要显示两个变量之间的某种 sigmoid 关系。因此,我不能满足于产生一条直线的线性回归方程。我需要看到发生在两个变量图形的右侧和左侧的斜率的逐渐曲线变化。
I started using numpy.polyfit after googling curvilinear regression and python, but that gave me the awful results you can see if you run the code below. Can anyone show me how to re-write the code below to get the type of sigmoidal regression equation that I want?
在谷歌搜索曲线回归和 python 之后,我开始使用 numpy.polyfit,但这给了我可怕的结果,你可以看到如果你运行下面的代码。 谁能告诉我如何重写下面的代码以获得我想要的 sigmoidal 回归方程的类型?
If you run the code below, you can see that it gives a downward facing parabola, which is not what the relationship between my variables should look like. Instead, there should be more of a sigmoidal relationship between my two variables, but with a tight fit with the data that I am using in the code below. The data in the code below are means from a large-sample research study, so they pack more statistical power than their five data points might suggest. I do not have the actual data from the large-sample research study, but I do have the means below and their standard deviations(which I am not showing). I would prefer to just plot a simple function with the mean data listed below, but the code could get more complex if complexity would offer substantial improvements.
如果你运行下面的代码,你可以看到它给出了一个向下的抛物线,这不是我的变量之间的关系应该是什么样的。相反,我的两个变量之间应该有更多的 sigmoidal 关系,但与我在下面的代码中使用的数据紧密配合。下面代码中的数据是来自大样本研究的平均值,因此它们包含的统计功效比五个数据点可能暗示的要多。我没有大样本研究的实际数据,但我有以下平均值及其标准差(我没有显示)。我更愿意用下面列出的平均数据绘制一个简单的函数,但如果复杂性可以提供实质性的改进,代码可能会变得更加复杂。
How can I change my code to show a best fit of a sigmoidal function, preferably using scipy, numpy, and python?Here is the current version of my code, which needs to be fixed:
如何更改代码以显示 sigmoidal 函数的最佳拟合,最好使用 scipy、numpy 和 python?这是我的代码的当前版本,需要修复:
import numpy as np
import matplotlib.pyplot as plt
# Create numpy data arrays
x = np.array([821,576,473,377,326])
y = np.array([255,235,208,166,157])
# Use polyfit and poly1d to create the regression equation
z = np.polyfit(x, y, 3)
p = np.poly1d(z)
xp = np.linspace(100, 1600, 1500)
pxp=p(xp)
# Plot the results
plt.plot(x, y, '.', xp, pxp, '-')
plt.ylim(140,310)
plt.xlabel('x')
plt.ylabel('y')
plt.grid(True)
plt.show()
EDIT BELOW: (Re-framed the question)
编辑如下:(重新定义问题)
Your response, and its speed, are very impressive. Thank you, unutbu. But, in order to produce more valid results, I need to re-frame my data values. This means re-casting x values as a percentage of the max x value, while re-casting y values as a percentage of the x-values in the original data. I tried to do this with your code, and came up with the following:
您的反应及其速度令人印象深刻。谢谢你,unutbu。但是,为了产生更有效的结果,我需要重新构建我的数据值。这意味着将 x 值重新转换为最大 x 值的百分比,同时将 y 值重新转换为原始数据中 x 值的百分比。我试图用你的代码做到这一点,并想出了以下内容:
import numpy as np
import matplotlib.pyplot as plt
import scipy.optimize
# Create numpy data arrays
'''
# Comment out original data
#x = np.array([821,576,473,377,326])
#y = np.array([255,235,208,166,157])
'''
# Re-calculate x values as a percentage of the first (maximum)
# original x value above
x = np.array([1.000,0.702,0.576,0.459,0.397])
# Recalculate y values as a percentage of their respective x values
# from original data above
y = np.array([0.311,0.408,0.440,0.440,0.482])
def sigmoid(p,x):
x0,y0,c,k=p
y = c / (1 + np.exp(-k*(x-x0))) + y0
return y
def residuals(p,x,y):
return y - sigmoid(p,x)
p_guess=(600,200,100,0.01)
(p,
cov,
infodict,
mesg,
ier)=scipy.optimize.leastsq(residuals,p_guess,args=(x,y),full_output=1,warning=True)
'''
# comment out original xp to allow for better scaling of
# new values
#xp = np.linspace(100, 1600, 1500)
'''
xp = np.linspace(0, 1.1, 1100)
pxp=sigmoid(p,xp)
x0,y0,c,k=p
print('''\
x0 = {x0}
y0 = {y0}
c = {c}
k = {k}
'''.format(x0=x0,y0=y0,c=c,k=k))
# Plot the results
plt.plot(x, y, '.', xp, pxp, '-')
plt.ylim(0,1)
plt.xlabel('x')
plt.ylabel('y')
plt.grid(True)
plt.show()
Can you show me how to fix this revised code?
NOTE: By re-casting the data, I have essentially rotated the 2d (x,y) sigmoid about the z-axis by 180 degrees. Also, the 1.000 is not really a maximum of the x values. Instead, 1.000 is a mean of the range of values from different test participants in a maximum test condition.
你能告诉我如何修复这个修改后的代码吗?
注意:通过重新转换数据,我基本上已经将 2d (x,y) sigmoid 绕 z 轴旋转了 180 度。此外,1.000 实际上并不是 x 值的最大值。相反,1.000 是在最大测试条件下来自不同测试参与者的值范围的平均值。
SECOND EDIT BELOW:
下面的第二次编辑:
Thank you, ubuntu. I carefully read through your code and looked aspects of it up in the scipy documentation. Since your name seems to pop up as a writer of the scipy documentation, I am hoping you can answer the following questions:
谢谢你,ubuntu。我仔细阅读了您的代码,并在 scipy 文档中查看了它的各个方面。由于您的名字似乎是 scipy 文档的作者,我希望您能回答以下问题:
1.) Does leastsq() call residuals(), which then returns the difference between the input y-vector and the y-vector returned by the sigmoid() function? If so, how does it account for the difference in the lengths of the input y-vector and the y-vector returned by the sigmoid() function?
1.)leastsq() 是否调用了residuals(),然后返回输入y 向量与sigmoid() 函数返回的y 向量之间的差值?如果是这样,它如何解释输入 y 向量和 sigmoid() 函数返回的 y 向量的长度差异?
2.) It looks like I can call leastsq() for any math equation, as long as I access that math equation through a residuals function, which in turn calls the math function. Is this true?
2.) 看起来我可以为任何数学方程调用 leastsq() ,只要我通过残差函数访问该数学方程,该残差函数又调用数学函数。这是真的?
3.) Also, I notice that p_guess has the same number of elements as p. Does this mean that the four elements of p_guess correspond in order, respectively, with the values returned by x0,y0,c, and k?
3.) 另外,我注意到 p_guess 与 p 具有相同数量的元素。这是否意味着 p_guess 的四个元素依次对应于 x0、y0、c 和 k 返回的值?
4.) Is the p that is sent as an argument to the residuals() and sigmoid() functions the same p that will be output by leastsq(), and the leastsq() function is using that p internally before returning it?
4.) 作为参数发送到residuals() 和sigmoid() 函数的p 是否与leastsq() 输出的p 相同,且leastsq() 函数在返回之前在内部使用该p?
5.) Can p and p_guess have any number of elements, depending on the complexity of the equation being used as a model, as long as the number of elements in p is equal to the number of elements in p_guess?
5.) 只要 p 中的元素数等于 p_guess 中的元素数,p 和 p_guess 是否可以有任意数量的元素,取决于用作模型的方程的复杂性?
回答by Jim Lewis
I don't think you're going to get good results with a polynomial fit of any degree -- since all polynomials go to infinity for sufficiently large and small X, but a sigmoid curve will asymptotically approach some finite value in each direction.
我不认为你会用任何程度的多项式拟合得到好的结果——因为对于足够大和足够小的 X,所有多项式都趋于无穷大,但是 S 形曲线将在每个方向上渐近地接近某个有限值。
I'm not a Python programmer, so I don't know if numpy has a more general curve fitting routine. If you have to roll your own, perhaps this article on Logistic regressionwill give you some ideas.
我不是 Python 程序员,所以不知道 numpy 是否有更通用的曲线拟合例程。如果你必须自己动手,也许这篇关于Logistic 回归的文章会给你一些想法。
回答by unutbu
Using scipy.optimize.leastsq:
import numpy as np
import matplotlib.pyplot as plt
import scipy.optimize
def sigmoid(p,x):
x0,y0,c,k=p
y = c / (1 + np.exp(-k*(x-x0))) + y0
return y
def residuals(p,x,y):
return y - sigmoid(p,x)
def resize(arr,lower=0.0,upper=1.0):
arr=arr.copy()
if lower>upper: lower,upper=upper,lower
arr -= arr.min()
arr *= (upper-lower)/arr.max()
arr += lower
return arr
# raw data
x = np.array([821,576,473,377,326],dtype='float')
y = np.array([255,235,208,166,157],dtype='float')
x=resize(-x,lower=0.3)
y=resize(y,lower=0.3)
print(x)
print(y)
p_guess=(np.median(x),np.median(y),1.0,1.0)
p, cov, infodict, mesg, ier = scipy.optimize.leastsq(
residuals,p_guess,args=(x,y),full_output=1,warning=True)
x0,y0,c,k=p
print('''\
x0 = {x0}
y0 = {y0}
c = {c}
k = {k}
'''.format(x0=x0,y0=y0,c=c,k=k))
xp = np.linspace(0, 1.1, 1500)
pxp=sigmoid(p,xp)
# Plot the results
plt.plot(x, y, '.', xp, pxp, '-')
plt.xlabel('x')
plt.ylabel('y',rotation='horizontal')
plt.grid(True)
plt.show()
yields
产量


with sigmoid parameters
带有 sigmoid 参数
x0 = 0.826964424481
y0 = 0.151506745435
c = 0.848564826467
k = -9.54442292022
Note that for newer versions of scipy (e.g. 0.9) there is also the scipy.optimize.curve_fitfunction which is easier to use than leastsq. A relevant discussion of fitting sigmoids using curve_fitcan be found here.
请注意,对于较新版本的 scipy(例如 0.9),还有scipy.optimize.curve_fit函数,它比leastsq. curve_fit可以在此处找到有关使用拟合 sigmoid 的相关讨论。
Edit: A resizefunction was added so that the raw data could be rescaled and shifted to fit any desired bounding box.
编辑:resize添加了一个功能,以便可以重新缩放和移动原始数据以适合任何所需的边界框。
"your name seems to pop up as a writer of the scipy documentation"
“你的名字似乎以 scipy 文档的作者的身份出现”
DISCLAIMER: I am not a writer of scipy documentation. I am just a user, and a novice at that. Much of what I know about leastsqcomes from reading this tutorial, written by Travis Oliphant.
免责声明:我不是 scipy 文档的作者。我只是一个用户,而且是新手。我所知道的大部分内容leastsq来自阅读Travis Oliphant 编写的本教程。
1.) Does leastsq() call residuals(), which then returns the difference between the input y-vector and the y-vector returned by the sigmoid() function?
1.)leastsq() 是否调用了residuals(),然后返回输入y 向量与sigmoid() 函数返回的y 向量之间的差值?
Yes! exactly.
是的!确切地。
If so, how does it account for the difference in the lengths of the input y-vector and the y-vector returned by the sigmoid() function?
如果是这样,它如何解释输入 y 向量和 sigmoid() 函数返回的 y 向量的长度差异?
The lengths are the same:
长度是一样的:
In [138]: x
Out[138]: array([821, 576, 473, 377, 326])
In [139]: y
Out[139]: array([255, 235, 208, 166, 157])
In [140]: p=(600,200,100,0.01)
In [141]: sigmoid(p,x)
Out[141]:
array([ 290.11439268, 244.02863507, 221.92572521, 209.7088641 ,
206.06539033])
One of the wonderful things about Numpy is that it allows you to write "vector" equations that operate on entire arrays.
Numpy 的一大优点是它允许您编写对整个数组进行运算的“向量”方程。
y = c / (1 + np.exp(-k*(x-x0))) + y0
might look like it works on floats (indeed it would) but if you make xa numpy array, and c,k,x0,y0floats, then the equation defines yto be a numpy array of the same shape as x. So sigmoid(p,x)returns a numpy array. There is a more complete explanation of how this works in the numpybook(required reading for serious users of numpy).
可能看起来它适用于浮点数(确实可以),但是如果您创建x一个 numpy 数组,并且c, k, x0, ,y0浮点数,则该方程定义y为与x. 所以sigmoid(p,x)返回一个 numpy 数组。在numpybook 中有更完整的解释说明这是如何工作的(numpy 的认真用户需要阅读)。
2.) It looks like I can call leastsq() for any math equation, as long as I access that math equation through a residuals function, which in turn calls the math function. Is this true?
2.) 看起来我可以为任何数学方程调用 leastsq() ,只要我通过残差函数访问该数学方程,该残差函数又调用数学函数。这是真的?
True. leastsqattempts to minimize the sum of the squares of the residuals (differences). It searches the parameter-space (all possible values of p) looking for the pwhich minimizes that sum of squares. The xand ysent to residuals, are your raw data values. They are fixed. They don't change. It's the ps (the parameters in the sigmoid function) that leastsqtries to minimize.
真的。leastsq尝试最小化残差(差值)的平方和。它搜索参数空间( 的所有可能值p),寻找p最小化平方和的 。在x与y发送到residuals,是你的原始数据值。他们是固定的。他们不会改变。它p是leastsq试图最小化的s(sigmoid 函数中的参数)。
3.) Also, I notice that p_guess has the same number of elements as p. Does this mean that the four elements of p_guess correspond in order, respectively, with the values returned by x0,y0,c, and k?
3.) 另外,我注意到 p_guess 与 p 具有相同数量的元素。这是否意味着 p_guess 的四个元素依次对应于 x0、y0、c 和 k 返回的值?
Exactly so! Like Newton's method, leastsqneeds an initial guess for p. You supply it as p_guess. When you see
正是如此!像牛顿法一样,leastsq需要对 进行初始猜测p。您将其作为p_guess. 当你看到
scipy.optimize.leastsq(residuals,p_guess,args=(x,y))
you can think that as part of the leastsq algorithm (really the Levenburg-Marquardt algorithm) as a first pass, leastsq calls residuals(p_guess,x,y).
Notice the visual similarity between
您可以认为作为 leastsq 算法(实际上是 Levenburg-Marquardt 算法)的一部分,作为第一遍,leastsq 调用residuals(p_guess,x,y). 注意两者之间的视觉相似性
(residuals,p_guess,args=(x,y))
and
和
residuals(p_guess,x,y)
It may help you remember the order and meaning of the arguments to leastsq.
它可以帮助您记住 的参数的顺序和含义leastsq。
residuals, like sigmoidreturns a numpy array. The values in the array are squared, and then summed. This is the number to beat. p_guessis then varied as leastsqlooks for a set of values which minimizes residuals(p_guess,x,y).
residuals, likesigmoid返回一个 numpy 数组。对数组中的值进行平方,然后求和。这是要击败的数字。p_guess然后随着leastsq寻找一组最小化 的值而变化residuals(p_guess,x,y)。
4.) Is the p that is sent as an argument to the residuals() and sigmoid() functions the same p that will be output by leastsq(), and the leastsq() function is using that p internally before returning it?
4.) 作为参数发送到residuals() 和sigmoid() 函数的p 是否与leastsq() 输出的p 相同,且leastsq() 函数在返回之前在内部使用该p?
Well, not exactly. As you know by now, p_guessis varied as leastsqsearches for the pvalue that minimizes residuals(p,x,y). The p(er, p_guess) that is sent to leastsqhas the same shape as the pthat is returned by leastsq. Obviously the values should be different unless you are a hell of a guesser :)
嗯,不完全是。正如您现在所知,p_guess随着leastsq搜索p最小化的值而变化residuals(p,x,y)。发送到的p(er, p_guess)leastsq与p返回的具有相同的形状leastsq。显然,值应该不同,除非你是个大猜测:)
5.) Can p and p_guess have any number of elements, depending on the complexity of the equation being used as a model, as long as the number of elements in p is equal to the number of elements in p_guess?
5.) 只要 p 中的元素数等于 p_guess 中的元素数,p 和 p_guess 是否可以有任意数量的元素,取决于用作模型的方程的复杂性?
Yes. I haven't stress-tested leastsqfor very large numbers of parameters, but it is a thrillingly powerful tool.
是的。我没有leastsq对大量参数进行压力测试,但它是一个非常强大的工具。
回答by Gael Varoquaux
For logistic regression in Python, the scikits-learnexposes high-performance fitting code:
对于 Python 中的逻辑回归,scikits-learn公开了高性能拟合代码:
http://scikit-learn.sourceforge.net/modules/linear_model.html#logistic-regression
http://scikit-learn.sourceforge.net/modules/linear_model.html#logistic-regression
回答by Ramon Martinez
As pointed out by @unutbu above scipynow provides scipy.optimize.curve_fitwhich possess a less complicated call. If someone wants a quick version of how the same process would look like in those terms I present a minimal example below:
正如上面@unutbu 所指出的,scipy现在提供了scipy.optimize.curve_fit,它具有一个不太复杂的调用。如果有人想要快速了解相同过程在这些术语中的样子,我将在下面提供一个最小示例:
from scipy.optimize import curve_fit
def sigmoid(x, k, x0):
return 1.0 / (1 + np.exp(-k * (x - x0)))
# Parameters of the true function
n_samples = 1000
true_x0 = 15
true_k = 1.5
sigma = 0.2
# Build the true function and add some noise
x = np.linspace(0, 30, num=n_samples)
y = sigmoid(x, k=true_k, x0=true_x0)
y_with_noise = y + sigma * np.random.randn(n_samples)
# Sample the data from the real function (this will be your data)
some_points = np.random.choice(1000, size=30) # take 30 data points
xdata = x[some_points]
ydata = y_with_noise[some_points]
# Fit the curve
popt, pcov = curve_fit(sigmoid, xdata, ydata)
estimated_k, estimated_x0 = popt
# Plot the fitted curve
y_fitted = sigmoid(x, k=estimated_k, x0=estimated_x0)
# Plot everything for illustration
fig = plt.figure()
ax = fig.add_subplot(111)
ax.plot(x, y_fitted, '--', label='fitted')
ax.plot(x, y, '-', label='true')
ax.plot(xdata, ydata, 'o', label='samples')
ax.legend()
The result of this is shown in the next figure:
结果如下图所示:

