使用python进行非线性回归 - 更好地拟合这些数据的简单方法是什么?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/51972637/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 20:00:06  来源:igfitidea点击:

Nonlinear regression with python - what's a simple method to fit this data better?

pythonregressioncurve-fitting

提问by Jinx

I have some data that I want to fit so I can make some estimations for the value of a physical parameter given a certain temperature.

我有一些想要拟合的数据,因此我可以对给定温度的物理参数值进行一些估计。

I used numpy.polyfit for a quadratic model, but the fit isn't quite as nice as I'd like it to be and I don't have much experience with regression.

我将 numpy.polyfit 用于二次模型,但拟合并不像我希望的那么好,而且我对回归没有太多经验。

I have included the scatter plot and the model provided by numpy: S vs Temperature; blue dots are experimental data, black line is the model

我已经包含了numpy提供的散点图和模型: S vs Temperature; 蓝点是实验数据,黑线是模型

The x axis is temperature (in C) and the y axis is the parameter, which we'll call S. This is experimental data, but in theory S should tends towards 0 as temperature increases and reach 1 as temperature decreases.

x 轴是温度(以 C 为单位),y 轴是参数,我们将其称为 S。这是实验数据,但理论上 S 应该随着温度的升高趋于 0,随着温度的降低而趋向于 1。

My question is: How can I fit this data better? What libraries should I use, what kind of function might approximate this data better than a polynomial, etc?

我的问题是:我怎样才能更好地拟合这些数据?我应该使用哪些库,什么样的函数可以比多项式更好地近似这些数据,等等?

I can provide code, coefficients of the polynomial, etc, if it's helpful.

如果有帮助,我可以提供代码、多项式系数等。

Here is a Dropbox link to my data.(Somewhat important note to avoid confusion, although it won't change the actual regression, the temperature column in this data set is Tc - T, where Tc is the transition temperature (40C). I converted this using pandas into T by calculating 40 - x).

这是我的数据的 Dropbox 链接。(避免混淆的重要说明,虽然它不会改变实际的回归,但此数据集中的温度列是 Tc - T,其中 Tc 是转变温度(40C)。我通过计算 40 使用 Pandas 将其转换为 T - X)。

回答by James Phillips

This example code uses an equation that has two shape parameters, a and b, and an offset term (that does not affect curvature). The equation is "y = 1.0 / (1.0 + exp(-a(x-b))) + Offset" with parameter values a = 2.1540318329369712E-01, b = -6.6744890642157646E+00, and Offset = -3.5241299859669645E-01 which gives an R-squared of 0.988 and an RMSE of 0.0085.

此示例代码使用一个方程,该方程具有两个形状参数 a 和 b,以及一个偏移项(不影响曲率)。等式是“y = 1.0 / (1.0 + exp(-a(xb))) + Offset”,参数值 a = 2.1540318329369712E-01, b = -6.6744890642157646E+00, Offset = -3.540318329369712E-01, Offset = -3.540318329369712E-01给出 0.988 的 R 平方和 0.0085 的 RMSE。

The example contains your posted data with Python code for fitting and graphing, with automatic initial parameter estimation using the scipy.optimize.differential_evolution genetic algorithm. The scipy implementation of Differential Evolution uses the Latin Hypercube algorithm to ensure a thorough search of parameter space, and this requires bounds within which to search - in this example code, these bounds are based on the maximum and minimum data values.

该示例包含使用 Python 代码进行拟合和绘图的发布数据,以及使用 scipy.optimize.differential_evolution 遗传算法的自动初始参数估计。差分进化的 scipy 实现使用拉丁超立方体算法来确保对参数空间的彻底搜索,这需要搜索范围 - 在此示例代码中,这些范围基于最大和最小数据值。

sigmoidal

sigmoidal

import numpy, scipy, matplotlib
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
from scipy.optimize import differential_evolution
import warnings

xData = numpy.array([19.1647, 18.0189, 16.9550, 15.7683, 14.7044, 13.6269, 12.6040, 11.4309, 10.2987, 9.23465, 8.18440, 7.89789, 7.62498, 7.36571, 7.01106, 6.71094, 6.46548, 6.27436, 6.16543, 6.05569, 5.91904, 5.78247, 5.53661, 4.85425, 4.29468, 3.74888, 3.16206, 2.58882, 1.93371, 1.52426, 1.14211, 0.719035, 0.377708, 0.0226971, -0.223181, -0.537231, -0.878491, -1.27484, -1.45266, -1.57583, -1.61717])
yData = numpy.array([0.644557, 0.641059, 0.637555, 0.634059, 0.634135, 0.631825, 0.631899, 0.627209, 0.622516, 0.617818, 0.616103, 0.613736, 0.610175, 0.606613, 0.605445, 0.603676, 0.604887, 0.600127, 0.604909, 0.588207, 0.581056, 0.576292, 0.566761, 0.555472, 0.545367, 0.538842, 0.529336, 0.518635, 0.506747, 0.499018, 0.491885, 0.484754, 0.475230, 0.464514, 0.454387, 0.444861, 0.437128, 0.415076, 0.401363, 0.390034, 0.378698])


def func(x, a, b, Offset): # Sigmoid A With Offset from zunzun.com
    return  1.0 / (1.0 + numpy.exp(-a * (x-b))) + Offset


# function for genetic algorithm to minimize (sum of squared error)
def sumOfSquaredError(parameterTuple):
    warnings.filterwarnings("ignore") # do not print warnings by genetic algorithm
    val = func(xData, *parameterTuple)
    return numpy.sum((yData - val) ** 2.0)


def generate_Initial_Parameters():
    # min and max used for bounds
    maxX = max(xData)
    minX = min(xData)
    maxY = max(yData)
    minY = min(yData)

    parameterBounds = []
    parameterBounds.append([minX, maxX]) # search bounds for a
    parameterBounds.append([minX, maxX]) # search bounds for b
    parameterBounds.append([0.0, maxY]) # search bounds for Offset

    # "seed" the numpy random number generator for repeatable results
    result = differential_evolution(sumOfSquaredError, parameterBounds, seed=3)
    return result.x

# generate initial parameter values
geneticParameters = generate_Initial_Parameters()

# curve fit the test data
fittedParameters, pcov = curve_fit(func, xData, yData, geneticParameters)

print('Parameters', fittedParameters)

modelPredictions = func(xData, *fittedParameters) 

absError = modelPredictions - yData

SE = numpy.square(absError) # squared errors
MSE = numpy.mean(SE) # mean squared errors
RMSE = numpy.sqrt(MSE) # Root Mean Squared Error, RMSE
Rsquared = 1.0 - (numpy.var(absError) / numpy.var(yData))
print('RMSE:', RMSE)
print('R-squared:', Rsquared)



##########################################################
# graphics output section
def ModelAndScatterPlot(graphWidth, graphHeight):
    f = plt.figure(figsize=(graphWidth/100.0, graphHeight/100.0), dpi=100)
    axes = f.add_subplot(111)

    # first the raw data as a scatter plot
    axes.plot(xData, yData,  'D')

    # create data for the fitted equation plot
    xModel = numpy.linspace(min(xData), max(xData))
    yModel = func(xModel, *fittedParameters)

    # now the model as a line plot 
    axes.plot(xModel, yModel)

    axes.set_xlabel('X Data') # X axis data label
    axes.set_ylabel('Y Data') # Y axis data label

    plt.show()
    plt.close('all') # clean up after using pyplot

graphWidth = 800
graphHeight = 600
ModelAndScatterPlot(graphWidth, graphHeight)

回答by Sunny Liu

For non-linear regression problem, you could try SVR(), KNeighborsRegressor() or DecisionTreeRegression() from sklearn, and compare the model performance on the test set.

对于非线性回归问题,您可以尝试使用 sklearn 的 SVR()、KNeighborsRegressor() 或 DecisionTreeRegression(),并在测试集上比较模型性能。

回答by PMende

I would suggest checking out scipy. They have a non-linear optimizer for fitting data to arbitrary functions. See the documentation for scipy.optimize.curve_fithere. Be aware that the more complex the function, the longer it will take to fit.

我建议检查一下scipy。他们有一个非线性优化器,用于将数据拟合到任意函数。请参阅scipy.optimize.curve_fit此处的文档。请注意,函数越复杂,拟合所需的时间就越长。

回答by Tarun Pratap

In Scikit Learn, you can use Polynomial Featuresto first transform your training data to have more degrees of freedom. After that, you can use Ridge Regressionto fit your training data.

在 Scikit Learn 中,您可以使用多项式特征首先转换您的训练数据以获得更多的自由度。之后,您可以使用岭回归来拟合您的训练数据。

回答by Kevin Koehler

Try a support vector machinewith a polynomial kernel.

尝试使用多项式内核的支持向量机

With scikit-learn, fitting a model can be as simple as:

使用 scikit-learn,拟合模型可以很简单:

from sklearn.svm import SVC
#... load the data into X,y
model = SVC(kernel='poly')
model.fit(X,y)
#plot the model...