Python平滑数据
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/28855928/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python smoothing data
提问by Alex
I have a dataset that I want smoothed. I have two variables y and x that are not evenly spaced. y is the dependant variable. However, I do no know what formula relates x to y.
我有一个想要平滑的数据集。我有两个不均匀间隔的变量 y 和 x。y 是因变量。但是,我不知道什么公式将 x 与 y 相关联。
I read all about interpolation, but interpolation requires me to know the formula that relates x to y. I also looked at other smoothing functions, but these cause problems in the start and endpoints.
我阅读了关于插值的所有内容,但插值要求我知道将 x 与 y 相关的公式。我还查看了其他平滑函数,但这些会导致起点和终点出现问题。
Does anyone know how to either: -Obtain a formula that relates x to y -Smooth the datapoints without messing up the endpoints
有谁知道如何: - 获得一个将 x 与 y 相关联的公式 - 在不弄乱端点的情况下平滑数据点
My data looks as followed:
我的数据如下所示:
import matplotlib.pyplot as plt
x = [0.0, 2.4343476531707129, 3.606959459205791, 3.9619355597454664, 4.3503348239356558, 4.6651002761894667, 4.9360228447915109, 5.1839565805565826, 5.5418099660513596, 5.7321342976055165,5.9841050994671106, 6.0478709402949216, 6.3525180590674513, 6.5181245134579893, 6.6627517592933767, 6.9217136972938444,7.103121623408132, 7.2477706136047413, 7.4502723880766748, 7.6174503055171137, 7.7451599936721376, 7.9813193157205191, 8.115292520850506,8.3312689109403202, 8.5648187916197998, 8.6728478860287623, 8.9629327234023926, 8.9974662723308612, 9.1532523634107257, 9.369326186780814, 9.5143785756455479, 9.5732694726297893, 9.8274813411538613, 10.088572892445802, 10.097305715988142, 10.229215999264703, 10.408589988296546, 10.525354763219688, 10.574678982757082, 10.885039893236041, 11.076574204171795, 11.091570626351352, 11.223859812944436, 11.391634940142225, 11.747328449715521, 11.799186895037078, 11.947711314893802, 12.240901223703657, 12.50151825769724, 12.811712563174883, 13.153496854155087, 13.978408296586579, 17.0, 25.0]
y = [0.0, 4.0, 6.0, 18.0, 30.0, 42.0, 54.0, 66.0, 78.0, 90.0, 102.0, 114.0, 126.0, 138.0, 150.0, 162.0, 174.0, 186.0, 198.0, 210.0, 222.0, 234.0, 246.0, 258.0, 270.0, 282.0, 294.0, 306.0, 318.0, 330.0, 342.0, 354.0, 366.0, 378.0, 390.0, 402.0, 414.0, 426.0, 438.0, 450.0, 462.0, 474.0, 486.0, 498.0, 510.0, 522.0, 534.0, 546.0, 558.0, 570.0, 582.0, 594.0, 600.0, 600.0]
#Smoothing here
fig, ax = plt.subplots(figsize=(8, 6))
ax.plot(x, y, color='red', label= 'Unsmoothed curve')
采纳答案by rth
I think there is a confusion here between smoothing (i.e filtering), interpolation and curve fitting,
我认为平滑(即过滤)、插值和曲线拟合之间存在混淆,
Filtering / smoothing: we apply an operator on the data that modifies the the original
y
points in a way to remove high frequency oscillations. This can be achieved with for instance withscipy.signal.convolve
,scipy.signal.medfilt
,scipy.signal.savgol_filter
or FFT based approaches.Interpolation: we create a continuous local representation of the data from the available data-points. Interpolation defines how the function behaves in between the data points, but does not modify the data points themselves. See for instance
scipy.interpolate.interp1d
. Though, to make things more complicated spline interpolationactually also does some smoothing.Curve fitting: we fit the data point by some analytical function. This allows to determine a global relationship between
x
andy
in our data, but requires to have some previous insight regarding the suitable fitting function. Seescipy.optimize.curve_fit
过滤/平滑:我们在数据上应用一个算子,
y
以消除高频振荡的方式修改原始点。这可以实现例如用scipy.signal.convolve
,scipy.signal.medfilt
,scipy.signal.savgol_filter
或基于FFT的方法。插值:我们从可用数据点创建数据的连续局部表示。插值定义了函数在数据点之间的行为方式,但不修改数据点本身。参见例如
scipy.interpolate.interp1d
。尽管如此,为了使事情变得更复杂,样条插值实际上也做了一些平滑处理。曲线拟合:我们通过一些分析函数拟合数据点。这允许确定我们的数据之间
x
和y
数据中的全局关系,但需要对合适的拟合函数有一些先前的了解。看scipy.optimize.curve_fit
In this particular case, the approach we can use is to first interpolate on a uniform grid (as in the @agomcas's answer) and then apply a Savitzky-Golay filter to smooth the data. Alternatively, the data can be fitted to some analytical expression, say based on the tanh function, but this needs to be tuned further:
在这种特殊情况下,我们可以使用的方法是首先在统一网格上进行插值(如@agomcas的答案),然后应用 Savitzky-Golay 滤波器来平滑数据。或者,数据可以拟合到一些分析表达式,比如基于 tanh 函数,但这需要进一步调整:
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
from scipy.interpolate import interp1d
from scipy.signal import savgol_filter
import numpy as np
x = np.array([0.0, 2.4343476531707129, 3.606959459205791, 3.9619355597454664, 4.3503348239356558, 4.6651002761894667, 4.9360228447915109, 5.1839565805565826, 5.5418099660513596, 5.7321342976055165,5.9841050994671106, 6.0478709402949216, 6.3525180590674513, 6.5181245134579893, 6.6627517592933767, 6.9217136972938444,7.103121623408132, 7.2477706136047413, 7.4502723880766748, 7.6174503055171137, 7.7451599936721376, 7.9813193157205191, 8.115292520850506,8.3312689109403202, 8.5648187916197998, 8.6728478860287623, 8.9629327234023926, 8.9974662723308612, 9.1532523634107257, 9.369326186780814, 9.5143785756455479, 9.5732694726297893, 9.8274813411538613, 10.088572892445802, 10.097305715988142, 10.229215999264703, 10.408589988296546, 10.525354763219688, 10.574678982757082, 10.885039893236041, 11.076574204171795, 11.091570626351352, 11.223859812944436, 11.391634940142225, 11.747328449715521, 11.799186895037078, 11.947711314893802, 12.240901223703657, 12.50151825769724, 12.811712563174883, 13.153496854155087, 13.978408296586579, 17.0, 25.0])
y = np.array([0.0, 4.0, 6.0, 18.0, 30.0, 42.0, 54.0, 66.0, 78.0, 90.0, 102.0, 114.0, 126.0, 138.0, 150.0, 162.0, 174.0, 186.0, 198.0, 210.0, 222.0, 234.0, 246.0, 258.0, 270.0, 282.0, 294.0, 306.0, 318.0, 330.0, 342.0, 354.0, 366.0, 378.0, 390.0, 402.0, 414.0, 426.0, 438.0, 450.0, 462.0, 474.0, 486.0, 498.0, 510.0, 522.0, 534.0, 546.0, 558.0, 570.0, 582.0, 594.0, 600.0, 600.0])
xx = np.linspace(x.min(),x.max(), 1000)
# interpolate + smooth
itp = interp1d(x,y, kind='linear')
window_size, poly_order = 101, 3
yy_sg = savgol_filter(itp(xx), window_size, poly_order)
# or fit to a global function
def func(x, A, B, x0, sigma):
return A+B*np.tanh((x-x0)/sigma)
fit, _ = curve_fit(func, x, y)
yy_fit = func(xx, *fit)
fig, ax = plt.subplots(figsize=(7, 4))
ax.plot(x, y, 'r.', label= 'Unsmoothed curve')
ax.plot(xx, yy_fit, 'b--', label=r"$f(x) = A + B \tanh\left(\frac{x-x_0}{\sigma}\right)$")
ax.plot(xx, yy_sg, 'k', label= "Smoothed curve")
plt.legend(loc='best')
回答by agomcas
Interpolation does notrequire you to know the formula relating x and y.
插值并没有要求你知道关于x和y的公式。
import matplotlib.pyplot as plt
from scipy import interpolate
import numpy as np
x = [0.0, 2.4343476531707129, 3.606959459205791, 3.9619355597454664, 4.3503348239356558, 4.6651002761894667, 4.9360228447915109, 5.1839565805565826, 5.5418099660513596, 5.7321342976055165,5.9841050994671106, 6.0478709402949216, 6.3525180590674513, 6.5181245134579893, 6.6627517592933767, 6.9217136972938444,7.103121623408132, 7.2477706136047413, 7.4502723880766748, 7.6174503055171137, 7.7451599936721376, 7.9813193157205191, 8.115292520850506,8.3312689109403202, 8.5648187916197998, 8.6728478860287623, 8.9629327234023926, 8.9974662723308612, 9.1532523634107257, 9.369326186780814, 9.5143785756455479, 9.5732694726297893, 9.8274813411538613, 10.088572892445802, 10.097305715988142, 10.229215999264703, 10.408589988296546, 10.525354763219688, 10.574678982757082, 10.885039893236041, 11.076574204171795, 11.091570626351352, 11.223859812944436, 11.391634940142225, 11.747328449715521, 11.799186895037078, 11.947711314893802, 12.240901223703657, 12.50151825769724, 12.811712563174883, 13.153496854155087, 13.978408296586579, 17.0, 25.0]
y = [0.0, 4.0, 6.0, 18.0, 30.0, 42.0, 54.0, 66.0, 78.0, 90.0, 102.0, 114.0, 126.0, 138.0, 150.0, 162.0, 174.0, 186.0, 198.0, 210.0, 222.0, 234.0, 246.0, 258.0, 270.0, 282.0, 294.0, 306.0, 318.0, 330.0, 342.0, 354.0, 366.0, 378.0, 390.0, 402.0, 414.0, 426.0, 438.0, 450.0, 462.0, 474.0, 486.0, 498.0, 510.0, 522.0, 534.0, 546.0, 558.0, 570.0, 582.0, 594.0, 600.0, 600.0]
f = interpolate.interp1d(x, y, kind="linear")
x_int = np.linspace(x[0],x[-1], 20)
y_int = f(x_int)
#Smoothing here
fig, ax = plt.subplots(figsize=(8, 6))
ax.plot(x, y, color='red', label= 'Unsmoothed curve')
ax.plot(x_int, y_int, color="blue", label= "Interpolated curve")