Python平滑数据

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/28855928/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 03:50:23  来源:igfitidea点击:

Python smoothing data

pythonsmoothing

提问by Alex

I have a dataset that I want smoothed. I have two variables y and x that are not evenly spaced. y is the dependant variable. However, I do no know what formula relates x to y.

我有一个想要平滑的数据集。我有两个不均匀间隔的变量 y 和 x。y 是因变量。但是,我不知道什么公式将 x 与 y 相关联。

I read all about interpolation, but interpolation requires me to know the formula that relates x to y. I also looked at other smoothing functions, but these cause problems in the start and endpoints.

我阅读了关于插值的所有内容,但插值要求我知道将 x 与 y 相关的公式。我还查看了其他平滑函数,但这些会导致起点和终点出现问题。

Does anyone know how to either: -Obtain a formula that relates x to y -Smooth the datapoints without messing up the endpoints

有谁知道如何: - 获得一个将 x 与 y 相关联的公式 - 在不弄乱端点的情况下平滑数据点

My data looks as followed:

我的数据如下所示:

import matplotlib.pyplot as plt

x = [0.0, 2.4343476531707129, 3.606959459205791, 3.9619355597454664, 4.3503348239356558, 4.6651002761894667, 4.9360228447915109, 5.1839565805565826, 5.5418099660513596, 5.7321342976055165,5.9841050994671106, 6.0478709402949216, 6.3525180590674513, 6.5181245134579893, 6.6627517592933767, 6.9217136972938444,7.103121623408132, 7.2477706136047413, 7.4502723880766748, 7.6174503055171137, 7.7451599936721376, 7.9813193157205191, 8.115292520850506,8.3312689109403202, 8.5648187916197998, 8.6728478860287623, 8.9629327234023926, 8.9974662723308612, 9.1532523634107257, 9.369326186780814, 9.5143785756455479, 9.5732694726297893, 9.8274813411538613, 10.088572892445802, 10.097305715988142, 10.229215999264703, 10.408589988296546, 10.525354763219688, 10.574678982757082, 10.885039893236041, 11.076574204171795, 11.091570626351352, 11.223859812944436, 11.391634940142225, 11.747328449715521, 11.799186895037078, 11.947711314893802, 12.240901223703657, 12.50151825769724, 12.811712563174883, 13.153496854155087, 13.978408296586579, 17.0, 25.0]
y = [0.0, 4.0, 6.0, 18.0, 30.0, 42.0, 54.0, 66.0, 78.0, 90.0, 102.0, 114.0, 126.0, 138.0, 150.0, 162.0, 174.0, 186.0, 198.0, 210.0, 222.0, 234.0, 246.0, 258.0, 270.0, 282.0, 294.0, 306.0, 318.0, 330.0, 342.0, 354.0, 366.0, 378.0, 390.0, 402.0, 414.0, 426.0, 438.0, 450.0, 462.0, 474.0, 486.0, 498.0, 510.0, 522.0, 534.0, 546.0, 558.0, 570.0, 582.0, 594.0, 600.0, 600.0]

#Smoothing here

fig, ax = plt.subplots(figsize=(8, 6))
ax.plot(x, y, color='red', label= 'Unsmoothed curve')

采纳答案by rth

I think there is a confusion here between smoothing (i.e filtering), interpolation and curve fitting,

我认为平滑(即过滤)、插值和曲线拟合之间存在混淆,

  • Filtering / smoothing: we apply an operator on the data that modifies the the original ypoints in a way to remove high frequency oscillations. This can be achieved with for instance with scipy.signal.convolve, scipy.signal.medfilt, scipy.signal.savgol_filteror FFT based approaches.

  • Interpolation: we create a continuous local representation of the data from the available data-points. Interpolation defines how the function behaves in between the data points, but does not modify the data points themselves. See for instance scipy.interpolate.interp1d. Though, to make things more complicated spline interpolationactually also does some smoothing.

  • Curve fitting: we fit the data point by some analytical function. This allows to determine a global relationship between xand yin our data, but requires to have some previous insight regarding the suitable fitting function. See scipy.optimize.curve_fit

  • 过滤/平滑:我们在数据上应用一个算子,y以消除高频振荡的方式修改原始点。这可以实现例如用scipy.signal.convolvescipy.signal.medfiltscipy.signal.savgol_filter或基于FFT的方法。

  • 插值:我们从可用数据点创建数据的连续局部表示。插值定义了函数在数据点之间的行为方式,但不修改数据点本身。参见例如scipy.interpolate.interp1d。尽管如此,为了使事情变得更复杂,样条插值实际上也做了一些平滑处理。

  • 曲线拟合:我们通过一些分析函数拟合数据点。这允许确定我们的数据之间xy数据中的全局关系,但需要对合适的拟合函数有一些先前的了解。看scipy.optimize.curve_fit

In this particular case, the approach we can use is to first interpolate on a uniform grid (as in the @agomcas's answer) and then apply a Savitzky-Golay filter to smooth the data. Alternatively, the data can be fitted to some analytical expression, say based on the tanh function, but this needs to be tuned further:

在这种特殊情况下,我们可以使用的方法是首先在统一网格上进行插值(如@agomcas的答案),然后应用 Savitzky-Golay 滤波器来平滑数据。或者,数据可以拟合到一些分析表达式,比如基于 tanh 函数,但这需要进一步调整:

import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
from scipy.interpolate import interp1d
from scipy.signal import savgol_filter
import numpy as np

x = np.array([0.0, 2.4343476531707129, 3.606959459205791, 3.9619355597454664, 4.3503348239356558, 4.6651002761894667, 4.9360228447915109, 5.1839565805565826, 5.5418099660513596, 5.7321342976055165,5.9841050994671106, 6.0478709402949216, 6.3525180590674513, 6.5181245134579893, 6.6627517592933767, 6.9217136972938444,7.103121623408132, 7.2477706136047413, 7.4502723880766748, 7.6174503055171137, 7.7451599936721376, 7.9813193157205191, 8.115292520850506,8.3312689109403202, 8.5648187916197998, 8.6728478860287623, 8.9629327234023926, 8.9974662723308612, 9.1532523634107257, 9.369326186780814, 9.5143785756455479, 9.5732694726297893, 9.8274813411538613, 10.088572892445802, 10.097305715988142, 10.229215999264703, 10.408589988296546, 10.525354763219688, 10.574678982757082, 10.885039893236041, 11.076574204171795, 11.091570626351352, 11.223859812944436, 11.391634940142225, 11.747328449715521, 11.799186895037078, 11.947711314893802, 12.240901223703657, 12.50151825769724, 12.811712563174883, 13.153496854155087, 13.978408296586579, 17.0, 25.0])
y = np.array([0.0, 4.0, 6.0, 18.0, 30.0, 42.0, 54.0, 66.0, 78.0, 90.0, 102.0, 114.0, 126.0, 138.0, 150.0, 162.0, 174.0, 186.0, 198.0, 210.0, 222.0, 234.0, 246.0, 258.0, 270.0, 282.0, 294.0, 306.0, 318.0, 330.0, 342.0, 354.0, 366.0, 378.0, 390.0, 402.0, 414.0, 426.0, 438.0, 450.0, 462.0, 474.0, 486.0, 498.0, 510.0, 522.0, 534.0, 546.0, 558.0, 570.0, 582.0, 594.0, 600.0, 600.0])


xx = np.linspace(x.min(),x.max(), 1000)

# interpolate + smooth
itp = interp1d(x,y, kind='linear')
window_size, poly_order = 101, 3
yy_sg = savgol_filter(itp(xx), window_size, poly_order)


# or fit to a global function
def func(x, A, B, x0, sigma):
    return A+B*np.tanh((x-x0)/sigma)

fit, _ = curve_fit(func, x, y)
yy_fit = func(xx, *fit)

fig, ax = plt.subplots(figsize=(7, 4))
ax.plot(x, y, 'r.', label= 'Unsmoothed curve')
ax.plot(xx, yy_fit, 'b--', label=r"$f(x) = A + B \tanh\left(\frac{x-x_0}{\sigma}\right)$")
ax.plot(xx, yy_sg, 'k', label= "Smoothed curve")
plt.legend(loc='best')

smoothing method

平滑法

回答by agomcas

Interpolation does notrequire you to know the formula relating x and y.

插值并没有要求你知道关于x和y的公式。

import matplotlib.pyplot as plt
from scipy import interpolate
import numpy as np

x = [0.0, 2.4343476531707129, 3.606959459205791, 3.9619355597454664, 4.3503348239356558, 4.6651002761894667, 4.9360228447915109, 5.1839565805565826, 5.5418099660513596, 5.7321342976055165,5.9841050994671106, 6.0478709402949216, 6.3525180590674513, 6.5181245134579893, 6.6627517592933767, 6.9217136972938444,7.103121623408132, 7.2477706136047413, 7.4502723880766748, 7.6174503055171137, 7.7451599936721376, 7.9813193157205191, 8.115292520850506,8.3312689109403202, 8.5648187916197998, 8.6728478860287623, 8.9629327234023926, 8.9974662723308612, 9.1532523634107257, 9.369326186780814, 9.5143785756455479, 9.5732694726297893, 9.8274813411538613, 10.088572892445802, 10.097305715988142, 10.229215999264703, 10.408589988296546, 10.525354763219688, 10.574678982757082, 10.885039893236041, 11.076574204171795, 11.091570626351352, 11.223859812944436, 11.391634940142225, 11.747328449715521, 11.799186895037078, 11.947711314893802, 12.240901223703657, 12.50151825769724, 12.811712563174883, 13.153496854155087, 13.978408296586579, 17.0, 25.0]
y = [0.0, 4.0, 6.0, 18.0, 30.0, 42.0, 54.0, 66.0, 78.0, 90.0, 102.0, 114.0, 126.0, 138.0, 150.0, 162.0, 174.0, 186.0, 198.0, 210.0, 222.0, 234.0, 246.0, 258.0, 270.0, 282.0, 294.0, 306.0, 318.0, 330.0, 342.0, 354.0, 366.0, 378.0, 390.0, 402.0, 414.0, 426.0, 438.0, 450.0, 462.0, 474.0, 486.0, 498.0, 510.0, 522.0, 534.0, 546.0, 558.0, 570.0, 582.0, 594.0, 600.0, 600.0]


f = interpolate.interp1d(x, y, kind="linear")
x_int = np.linspace(x[0],x[-1], 20)
y_int = f(x_int)

#Smoothing here

fig, ax = plt.subplots(figsize=(8, 6))
ax.plot(x, y, color='red', label= 'Unsmoothed curve')
ax.plot(x_int, y_int, color="blue", label= "Interpolated curve")