python中的最小二乘法?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/43616993/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Least square method in python?
提问by Philipp
I have these values:
我有这些价值观:
T_values = (222, 284, 308.5, 333, 358, 411, 477, 518, 880, 1080, 1259) (x values)
C/(3Nk)_values = (0.1282, 0.2308, 0.2650, 0.3120 , 0.3547, 0.4530, 0.5556, 0.6154, 0.8932, 0.9103, 0.9316) (y values)
I know they follow the model:
我知道他们遵循以下模式:
C/(3Nk)=(h*w/(k*T))**2*(exp(h*w/(k*T)))/(exp(h*w/(k*T)-1))**2
I also know that k=1.38*10**(-23)
and h=6.626*10**(-34)
.
I have to find the w that best describes the measurement data. I'd like to solve this using the least square method in python, however I don't really understand how this works. Can anyone help me?
我也知道那个k=1.38*10**(-23)
和h=6.626*10**(-34)
。我必须找到最能描述测量数据的 w。我想使用 python 中的最小二乘法来解决这个问题,但是我真的不明白这是如何工作的。谁能帮我?
回答by pylang
This answer provides a walk-through on using Python to determine fitting parameters for a general exponential pattern.
此答案提供了有关使用 Python 确定一般指数模式的拟合参数的演练。
Data Cleaning
数据清洗
First, let's input and organize the sampling data as numpy arrays, which will later help with computation and clarity.
首先,让我们将采样数据输入和组织为 numpy 数组,这将有助于计算和清晰度。
import matplotlib.pyplot as plt
import scipy.optimize as opt
import numpy as np
#% matplotlib inline
# DATA ------------------------------------------------------------------------
T_values = np.array([222, 284, 308.5, 333, 358, 411, 477, 518, 880, 1080, 1259])
C_values = np.array([0.1282, 0.2308, 0.2650, 0.3120 , 0.3547, 0.4530, 0.5556, 0.6154, 0.8932, 0.9103, 0.9316])
x_samp = T_values
y_samp = C_values
There are many curve fittingfunctions in scipy and numpy and each is used differently, e.g. scipy.optimize.leastsq
and scipy.optimize.least_squares
. For simplicity, we will use scipy.optimize.curve_fit
, but it is difficult to find an optimized regression curve without selecting reasonable starting parameters. A simple technique will later be demonstrated on selecting starting parameters.
scipy 和 numpy 中有许多曲线拟合函数,每个函数的用法都不同,例如scipy.optimize.leastsq
和scipy.optimize.least_squares
。为简单起见,我们将使用scipy.optimize.curve_fit
,但如果不选择合理的起始参数,则很难找到优化的回归曲线。稍后将演示选择启动参数的简单技术。
Review
First, although the OP provided an expected fitting equation, we will approach the problem of using Python to curve fit by reviewing the general equation for an exponential function:
首先,虽然 OP 提供了预期的拟合方程,但我们将通过查看指数函数的一般方程来解决使用 Python 进行曲线拟合的问题:
Now we build this general function, which will be used a few times:
现在我们构建这个通用函数,它会用到几次:
# GENERAL EQUATION ------------------------------------------------------------
def func(x, A, c, d):
return A*np.exp(c*x) + d
Trends
趋势
- amplitude: a small
A
gives a small amplitude - shape: a small
c
controls the shape by flattening the "knee" of the curve - position:
d
sets the y-intercept - orientation: a negative
A
flips the curve across a horizontal axis; a negativec
flips the curve across a vertical axis
- 振幅:小
A
给出小振幅 - shape:
c
通过拉平曲线的“膝盖”来控制形状 - position:
d
设置 y 轴截距 - 方向:负数
A
在水平轴上翻转曲线;负值c
在垂直轴上翻转曲线
The latter trends are illustrated below, highlighting the control (black line) compared to a line with a varied parameter (red line):
后一种趋势如下图所示,与具有不同参数的线(红线)相比,突出显示了控制(黑线):
Selecting Initial Parameters
选择初始参数
Using the latter trends, let us next look at the data and try to emulate the curve by adjusting these parameters. For demonstration, we plot several trial equations against our data:
使用后一种趋势,让我们接下来查看数据并尝试通过调整这些参数来模拟曲线。为了演示,我们根据我们的数据绘制了几个试验方程:
# SURVEY ----------------------------------------------------------------------
# Plotting Sampling Data
plt.plot(x_samp, y_samp, "ko", label="Data")
x_lin = np.linspace(0, x_samp.max(), 50) # a number line, 50 evenly spaced digits between 0 and max
# Trials
A, c, d = -1, -1e-2, 1
y_trial1 = func(x_lin, A, c, d)
y_trial2 = func(x_lin, -1, -1e-3, 1)
y_trial3 = func(x_lin, -1, -3e-3, 1)
plt.plot(x_lin, y_trial1, "--", label="Trial 1")
plt.plot(x_lin, y_trial2, "--", label="Trial 2")
plt.plot(x_lin, y_trial3, "--", label="Trial 3")
plt.legend()
From simple trial and error, we can approximate the shape, amplitude, position and orientation of the curve better. For instance, we know the first two parameters (A
and c
) must be negative. We also have a reasonable guess for the order of magnitude for c
.
通过简单的试错,我们可以更好地近似曲线的形状、幅度、位置和方向。例如,我们知道前两个参数 (A
和c
) 必须是负数。我们也对 的数量级有一个合理的猜测c
。
Computing Estimated Parameters
计算估计参数
We will now use the parameters of the best trial for our initial guesses:
我们现在将使用最佳试验的参数进行初始猜测:
# REGRESSION ------------------------------------------------------------------
p0 = [-1, -3e-3, 1] # guessed params
w, _ = opt.curve_fit(func, x_samp, y_samp, p0=p0)
print("Estimated Parameters", w)
# Model
y_model = func(x_lin, *w)
# PLOT ------------------------------------------------------------------------
# Visualize data and fitted curves
plt.plot(x_samp, y_samp, "ko", label="Data")
plt.plot(x_lin, y_model, "k--", label="Fit")
plt.title("Least squares regression")
plt.legend(loc="upper left")
# Estimated Parameters [-1.66301087 -0.0026884 1.00995394]
How Does this Work?
这是如何运作的?
curve_fit
is one of many optimization functionsoffered by scipy. Given an initial value, the resulting estimated parameters are iteratively refined so that the resulting curve minimizes the residual error, or difference between the fitted line and sampling data. A better guess reduces the number of iterations and speeds up the result. With these estimated parameters for the fitted curve, one can now calculate the specific coefficients for a particular equation (a final exercise left to the OP).
curve_fit
是scipy 提供的众多优化功能之一。给定一个初始值,对所得估计参数进行迭代改进,从而使所得曲线最小化残差或拟合线与采样数据之间的差异。更好的猜测会减少迭代次数并加快结果速度。使用拟合曲线的这些估计参数,现在可以计算特定方程的特定系数(留给 OP 的最后练习)。
回答by Mohammad Athar
You want to use scipy
:
你想使用scipy
:
import scipy.optimize.curve_fit
def my_model(T,w):
return (hw/(kT))**2*(exp(hw/(kT)))/(exp(hw/(kT)-1))**2
w= 0 #initial guess
popt, pcov = curve_fit(my_model, T_values, C_values,p0=[w])