Python 如何使用 pylab 和 numpy 为我的数据拟合正弦曲线?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/16716302/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How do I fit a sine curve to my data with pylab and numpy?
提问by ChapmIndustries
For a school project I am trying to show that economies follow a relatively sinusoidal growth pattern. Beyond the economics of it, which are admittedly dodgy, I am building a python simulation to show that even when we let some degree of randomness take hold, we can still produce something relatively sinusoidal. I am happy with my data that I'm producing but now Id like to find some way to get a sine graph that pretty closely matches the data. I know you can do polynomial fit, but can you do sine fit?
对于一个学校项目,我试图表明经济遵循相对正弦增长模式。除了它的经济性,这无疑是狡猾的,我正在构建一个 python 模拟,以表明即使我们让某种程度的随机性占据上风,我们仍然可以产生一些相对正弦的东西。我对我正在生成的数据感到满意,但现在我想找到某种方法来获得与数据非常匹配的正弦图。我知道您可以进行多项式拟合,但是您可以进行正弦拟合吗?
Thanks for your help in advance. Let me know if there's any parts of the code you want to see.
提前感谢您的帮助。如果您想查看代码的任何部分,请告诉我。
采纳答案by Dhara
You can use the least-square optimizationfunction in scipy to fit any arbitrary function to another. In case of fitting a sin function, the 3 parameters to fit are the offset ('a'), amplitude ('b') and the phase ('c').
您可以使用scipy 中的最小二乘优化函数将任意函数拟合到另一个函数。在拟合 sin 函数的情况下,要拟合的 3 个参数是偏移 ('a')、幅度 ('b') 和相位 ('c')。
As long as you provide a reasonable first guess of the parameters, the optimization should converge well.Fortunately for a sine function, first estimates of 2 of these are easy: the offset can be estimated by taking the mean of the data and the amplitude via the RMS (3*standard deviation/sqrt(2)).
只要您提供对参数的合理初步猜测,优化就应该很好地收敛。幸运的是,对于正弦函数,对其中 2 个的初步估计很容易:可以通过取数据的平均值和幅度来估计偏移量RMS(3*标准偏差/sqrt(2))。
Note: as a later edit, frequency fitting has also been added. This does not work very well (can lead to extremely poor fits). Thus, use at your discretion, my advise would be to not use frequency fitting unless frequency error is smaller than a few percent.
注意:作为稍后的编辑,还添加了频率拟合。这不能很好地工作(可能导致极差的拟合)。因此,请自行决定使用,我的建议是不要使用频率拟合,除非频率误差小于几个百分点。
This leads to the following code:
这导致以下代码:
import numpy as np
from scipy.optimize import leastsq
import pylab as plt
N = 1000 # number of data points
t = np.linspace(0, 4*np.pi, N)
f = 1.15247 # Optional!! Advised not to use
data = 3.0*np.sin(f*t+0.001) + 0.5 + np.random.randn(N) # create artificial data with noise
guess_mean = np.mean(data)
guess_std = 3*np.std(data)/(2**0.5)/(2**0.5)
guess_phase = 0
guess_freq = 1
guess_amp = 1
# we'll use this to plot our first estimate. This might already be good enough for you
data_first_guess = guess_std*np.sin(t+guess_phase) + guess_mean
# Define the function to optimize, in this case, we want to minimize the difference
# between the actual data and our "guessed" parameters
optimize_func = lambda x: x[0]*np.sin(x[1]*t+x[2]) + x[3] - data
est_amp, est_freq, est_phase, est_mean = leastsq(optimize_func, [guess_amp, guess_freq, guess_phase, guess_mean])[0]
# recreate the fitted curve using the optimized parameters
data_fit = est_amp*np.sin(est_freq*t+est_phase) + est_mean
# recreate the fitted curve using the optimized parameters
fine_t = np.arange(0,max(t),0.1)
data_fit=est_amp*np.sin(est_freq*fine_t+est_phase)+est_mean
plt.plot(t, data, '.')
plt.plot(t, data_first_guess, label='first guess')
plt.plot(fine_t, data_fit, label='after fitting')
plt.legend()
plt.show()


Edit: I assumed that you know the number of periods in the sine-wave. If you don't, it's somewhat trickier to fit. You can try and guess the number of periods by manual plotting and try and optimize it as your 6th parameter.
编辑:我假设您知道正弦波中的周期数。如果您不这样做,则适应起来会有些棘手。您可以尝试通过手动绘图来猜测周期数,并尝试将其优化为您的第 6 个参数。
回答by JJacquelin
The current methods to fit a sin curve to a given data set require a first guess of the parameters, followed by an interative process. This is a non-linear regression problem.
当前将正弦曲线拟合到给定数据集的方法需要首先猜测参数,然后进行交互过程。这是一个非线性回归问题。
A different method consists in transforming the non-linear regression to a linear regression thanks to a convenient integral equation. Then, there is no need for initial guess and no need for iterative process : the fitting is directly obtained.
由于方便的积分方程,另一种方法包括将非线性回归转换为线性回归。然后,不需要初始猜测,不需要迭代过程:直接获得拟合。
In case of the function y = a + r*sin(w*x+phi)or y=a+b*sin(w*x)+c*cos(w*x), see pages 35-36 of the paper "Régression sinusoidale"published on Scribd
如果是函数y = a + r*sin(w*x+phi)or y=a+b*sin(w*x)+c*cos(w*x),请参阅Scribd上"Régression sinusoidale"发表的论文的第 35-36 页
In case of the function y = a + p*x + r*sin(w*x+phi): pages 49-51 of the chapter "Mixed linear and sinusoidal regressions".
在函数的情况下y = a + p*x + r*sin(w*x+phi):“混合线性和正弦回归”一章的第 49-51 页。
In case of more complicated functions, the general process is explained in the chapter "Generalized sinusoidal regression"pages 54-61, followed by a numerical example y = r*sin(w*x+phi)+(b/x)+c*ln(x), pages 62-63
对于更复杂的函数,一般过程在第"Generalized sinusoidal regression"54-61 页中解释,然后是一个数值示例y = r*sin(w*x+phi)+(b/x)+c*ln(x),第 62-63 页
回答by Vasco
More userfriendly to us is the function curvefit. Here an example:
对我们来说更友好的是函数曲线拟合。这里有一个例子:
import numpy as np
from scipy.optimize import curve_fit
import pylab as plt
N = 1000 # number of data points
t = np.linspace(0, 4*np.pi, N)
data = 3.0*np.sin(t+0.001) + 0.5 + np.random.randn(N) # create artificial data with noise
guess_freq = 1
guess_amplitude = 3*np.std(data)/(2**0.5)
guess_phase = 0
guess_offset = np.mean(data)
p0=[guess_freq, guess_amplitude,
guess_phase, guess_offset]
# create the function we want to fit
def my_sin(x, freq, amplitude, phase, offset):
return np.sin(x * freq + phase) * amplitude + offset
# now do the fit
fit = curve_fit(my_sin, t, data, p0=p0)
# we'll use this to plot our first estimate. This might already be good enough for you
data_first_guess = my_sin(t, *p0)
# recreate the fitted curve using the optimized parameters
data_fit = my_sin(t, *fit[0])
plt.plot(data, '.')
plt.plot(data_fit, label='after fitting')
plt.plot(data_first_guess, label='first guess')
plt.legend()
plt.show()
回答by unsym
Here is a parameter-free fitting function fit_sin()that does not require manual guess of frequency:
这是一个fit_sin()不需要手动猜测频率的无参数拟合函数:
import numpy, scipy.optimize
def fit_sin(tt, yy):
'''Fit sin to the input time sequence, and return fitting parameters "amp", "omega", "phase", "offset", "freq", "period" and "fitfunc"'''
tt = numpy.array(tt)
yy = numpy.array(yy)
ff = numpy.fft.fftfreq(len(tt), (tt[1]-tt[0])) # assume uniform spacing
Fyy = abs(numpy.fft.fft(yy))
guess_freq = abs(ff[numpy.argmax(Fyy[1:])+1]) # excluding the zero frequency "peak", which is related to offset
guess_amp = numpy.std(yy) * 2.**0.5
guess_offset = numpy.mean(yy)
guess = numpy.array([guess_amp, 2.*numpy.pi*guess_freq, 0., guess_offset])
def sinfunc(t, A, w, p, c): return A * numpy.sin(w*t + p) + c
popt, pcov = scipy.optimize.curve_fit(sinfunc, tt, yy, p0=guess)
A, w, p, c = popt
f = w/(2.*numpy.pi)
fitfunc = lambda t: A * numpy.sin(w*t + p) + c
return {"amp": A, "omega": w, "phase": p, "offset": c, "freq": f, "period": 1./f, "fitfunc": fitfunc, "maxcov": numpy.max(pcov), "rawres": (guess,popt,pcov)}
The initial frequency guess is given by the peak frequency in the frequency domain using FFT. The fitting result is almost perfect assuming there is only one dominant frequency (other than the zero frequency peak).
初始频率猜测由使用 FFT 的频域中的峰值频率给出。假设只有一个主频率(零频率峰值除外),拟合结果几乎是完美的。
import pylab as plt
N, amp, omega, phase, offset, noise = 500, 1., 2., .5, 4., 3
#N, amp, omega, phase, offset, noise = 50, 1., .4, .5, 4., .2
#N, amp, omega, phase, offset, noise = 200, 1., 20, .5, 4., 1
tt = numpy.linspace(0, 10, N)
tt2 = numpy.linspace(0, 10, 10*N)
yy = amp*numpy.sin(omega*tt + phase) + offset
yynoise = yy + noise*(numpy.random.random(len(tt))-0.5)
res = fit_sin(tt, yynoise)
print( "Amplitude=%(amp)s, Angular freq.=%(omega)s, phase=%(phase)s, offset=%(offset)s, Max. Cov.=%(maxcov)s" % res )
plt.plot(tt, yy, "-k", label="y", linewidth=2)
plt.plot(tt, yynoise, "ok", label="y with noise")
plt.plot(tt2, res["fitfunc"](tt2), "r-", label="y fit curve", linewidth=2)
plt.legend(loc="best")
plt.show()
The result is good even with high noise:
即使噪声很高,结果也很好:
Amplitude=1.00660540618, Angular freq.=2.03370472482, phase=0.360276844224, offset=3.95747467506, Max. Cov.=0.0122923578658
振幅=1.00660540618,角频率=2.03370472482,相位=0.360276844224,偏移=3.95747467506,最大 系数=0.0122923578658






回答by gflavia
The following Python script performs least squares fitting of sinusoids as described in Bloomfield (2000), "Fourier Analysis of Time Series", Wiley. The key steps are the following:
以下 Python 脚本执行正弦曲线的最小二乘拟合,如 Bloomfield (2000), "Fourier Analysis of Time Series", Wiley 中所述。关键步骤如下:
- Define a range of different possible frequencies.
- For each of the frequencies specified at point 1 above, estimate all the parameters of the sinusoid (mean, amplitude and phase) by ordinary least squares (OLS).
- Among all the different sets of parameters estimated at point 2 above, select the one that minimizes the residual sum of squares (RSS).
- 定义一系列不同的可能频率。
- 对于上面第 1 点指定的每个频率,通过普通最小二乘法 (OLS) 估计正弦曲线的所有参数(平均值、幅度和相位)。
- 在上述第 2 点估计的所有不同参数组中,选择最小化残差平方和 (RSS) 的一组。
import pandas as pd
import numpy as np
import statsmodels.api as sm
import matplotlib.pyplot as plt
######################################################################################
# (1) generate the data
######################################################################################
omega=4.5 # frequency
theta0=0.5 # mean
theta1=1.5 # amplitude
phi=-0.25 # phase
n=1000 # number of observations
sigma=1.25 # error standard deviation
e=np.random.normal(0,sigma,n) # Gaussian error
t=np.linspace(1,n,n)/n # time index
y=theta0+theta1*np.cos(2*np.pi*(omega*t+phi))+e # time series
######################################################################################
# (2) estimate the parameters
######################################################################################
# define the range of different possible frequencies
f=np.linspace(1,12,100)
# create a data frame to store the estimated parameters and the associated
# residual sum of squares for each of the considered frequencies
coefs=pd.DataFrame(data=np.zeros((len(f),5)), columns=["omega","theta0","theta1","phi","RSS"])
for i in range(len(f)): # iterate across the different considered frequencies
x1=np.cos(2*np.pi*f[i]*t) # define the first regressor
x2=np.sin(2*np.pi*f[i]*t) # define the second regressor
X=sm.add_constant(np.transpose(np.vstack((x1,x2)))) # add the intercept
reg=sm.OLS(y,X).fit() # fit the regression by OLS
A=reg.params[1] # estimated coefficient of first regressor
B=reg.params[2] # estimated coefficient of second regressor
# estimated mean
mean=reg.params[0]
# estimated amplitude
amplitude=np.sqrt(A**2+B**2)
# estimated phase
if A>0: phase=np.arctan(-B/A)/(2*np.pi)
if A<0 and B>0: phase=(np.arctan(-B/A)-np.pi)/(2*np.pi)
if A<0 and B<=0: phase=(np.arctan(-B/A)+np.pi)/(2*np.pi)
if A==0 and B>0: phase=-1/4
if A==0 and B<0: phase=1/4
# fitted sinusoid
s=mean+amplitude*np.cos(2*np.pi*(f[i]*t+phase))
# residual sum of squares
RSS=np.sum((y-s)**2)
# save the estimation results
coefs["omega"][i]=f[i]
coefs["theta0"][i]=mean
coefs["theta1"][i]=amplitude
coefs["phi"][i]=phase
coefs["RSS"][i]=RSS
del x1, x2, X, reg, A, B, mean, amplitude, phase, s, RSS
######################################################################################
# (3) analyze the results
######################################################################################
# extract the set of parameters that minimizes the residual sum of squares
coefs=coefs.loc[coefs["RSS"]==coefs["RSS"].min(),]
# calculate the fitted sinusoid
s=coefs["theta0"].values+coefs["theta1"].values*np.cos(2*np.pi*(coefs["omega"].values*t+coefs["phi"].values))
# plot the fitted sinusoid
plt.plot(y,color="black",linewidth=1,label="actual")
plt.plot(s,color="lightgreen",linewidth=3,label="fitted")
plt.ylim(ymax=np.max(y)*1.3)
plt.legend(loc=1)
plt.show()

