如何在 Python 中应用分段线性拟合?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/29382903/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to apply piecewise linear fit in Python?
提问by Tom Kurushingal
I am trying to fit piecewise linear fit as shown in fig.1 for a data set
我正在尝试拟合数据集的分段线性拟合,如图 1 所示
This figure was obtained by setting on the lines. I attempted to apply a piecewise linear fit using the code:
这个数字是通过设置在线获得的。我尝试使用以下代码应用分段线性拟合:
from scipy import optimize
import matplotlib.pyplot as plt
import numpy as np
x = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10 ,11, 12, 13, 14, 15])
y = np.array([5, 7, 9, 11, 13, 15, 28.92, 42.81, 56.7, 70.59, 84.47, 98.36, 112.25, 126.14, 140.03])
def linear_fit(x, a, b):
return a * x + b
fit_a, fit_b = optimize.curve_fit(linear_fit, x[0:5], y[0:5])[0]
y_fit = fit_a * x[0:7] + fit_b
fit_a, fit_b = optimize.curve_fit(linear_fit, x[6:14], y[6:14])[0]
y_fit = np.append(y_fit, fit_a * x[6:14] + fit_b)
figure = plt.figure(figsize=(5.15, 5.15))
figure.clf()
plot = plt.subplot(111)
ax1 = plt.gca()
plot.plot(x, y, linestyle = '', linewidth = 0.25, markeredgecolor='none', marker = 'o', label = r'\textit{y_a}')
plot.plot(x, y_fit, linestyle = ':', linewidth = 0.25, markeredgecolor='none', marker = '', label = r'\textit{y_b}')
plot.set_ylabel('Y', labelpad = 6)
plot.set_xlabel('X', labelpad = 6)
figure.savefig('test.pdf', box_inches='tight')
plt.close()
But this gave me fitting of the form in fig. 2, I tried playing with the values but no change I can't get the fit of the upper line proper. The most important requirement for me is how can I get Python to get the gradient change point. In essence I want Python to recognize and fit two linear fits in the appropriate range. How can this be done in Python?
但这给了我图 1 中的形式的拟合。2,我尝试使用这些值但没有变化我无法正确地拟合上线。对我来说最重要的需求是如何让Python获得梯度变化点。本质上,我希望 Python 能够识别并拟合适当范围内的两个线性拟合。这如何在 Python 中完成?
采纳答案by HYRY
You can use numpy.piecewise()
to create the piecewise function and then use curve_fit()
, Here is the code
您可以使用numpy.piecewise()
创建分段函数然后使用curve_fit()
,这是代码
from scipy import optimize
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline
x = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10 ,11, 12, 13, 14, 15], dtype=float)
y = np.array([5, 7, 9, 11, 13, 15, 28.92, 42.81, 56.7, 70.59, 84.47, 98.36, 112.25, 126.14, 140.03])
def piecewise_linear(x, x0, y0, k1, k2):
return np.piecewise(x, [x < x0], [lambda x:k1*x + y0-k1*x0, lambda x:k2*x + y0-k2*x0])
p , e = optimize.curve_fit(piecewise_linear, x, y)
xd = np.linspace(0, 15, 100)
plt.plot(x, y, "o")
plt.plot(xd, piecewise_linear(xd, *p))
the output:
输出:
回答by hakanc
You could do a spline interpolationscheme to both perform piecewise linear interpolation and find the turning point of the curve. The second derivative will be the highest at the turning point (for an monotonically increasing curve), and can be calculated with a spline interpolation of order > 2.
您可以使用样条插值方案来执行分段线性插值并找到曲线的转折点。二阶导数将在转折点处最高(对于单调递增的曲线),并且可以使用阶数 > 2 的样条插值计算。
import numpy as np
import matplotlib.pyplot as plt
from scipy import interpolate
x = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10 ,11, 12, 13, 14, 15])
y = np.array([5, 7, 9, 11, 13, 15, 28.92, 42.81, 56.7, 70.59, 84.47, 98.36, 112.25, 126.14, 140.03])
tck = interpolate.splrep(x, y, k=2, s=0)
xnew = np.linspace(0, 15)
fig, axes = plt.subplots(3)
axes[0].plot(x, y, 'x', label = 'data')
axes[0].plot(xnew, interpolate.splev(xnew, tck, der=0), label = 'Fit')
axes[1].plot(x, interpolate.splev(x, tck, der=1), label = '1st dev')
dev_2 = interpolate.splev(x, tck, der=2)
axes[2].plot(x, dev_2, label = '2st dev')
turning_point_mask = dev_2 == np.amax(dev_2)
axes[2].plot(x[turning_point_mask], dev_2[turning_point_mask],'rx',
label = 'Turning point')
for ax in axes:
ax.legend(loc = 'best')
plt.show()
回答by Binoy Pilakkat
Use numpy.interp
which returns the one-dimensional piecewise linear interpolant to a function with given values at discrete data-points.
使用numpy.interp
将一维分段线性插值返回给具有离散数据点给定值的函数。
回答by vhcandido
Extending @binoy-pilakkat's answer.
扩展@binoy-pilakkat 的回答。
You should use numpy.interp
:
你应该使用numpy.interp
:
import numpy as np
import matplotlib.pyplot as plt
x = np.array(range(1,16), dtype=float)
y = np.array([5, 7, 9, 11, 13, 15, 28.92,
42.81, 56.7, 70.59, 84.47,
98.36, 112.25, 126.14, 140.03], dtype=float)
yinterp = np.interp(x, x, y) # simple as that
plt.plot(x, y, 'bo')
plt.plot(x, yinterp, 'g-')
plt.show()
回答by pinseng
An example for two change points. If you want, just test more change points based on this example.
两个变化点的示例。如果需要,只需根据此示例测试更多更改点。
np.random.seed(9999)
x = np.random.normal(0, 1, 1000) * 10
y = np.where(x < -15, -2 * x + 3 , np.where(x < 10, x + 48, -4 * x + 98)) + np.random.normal(0, 3, 1000)
plt.scatter(x, y, s = 5, color = u'b', marker = '.', label = 'scatter plt')
def piecewise_linear(x, x0, x1, b, k1, k2, k3):
condlist = [x < x0, (x >= x0) & (x < x1), x >= x1]
funclist = [lambda x: k1*x + b, lambda x: k1*x + b + k2*(x-x0), lambda x: k1*x + b + k2*(x-x0) + k3*(x - x1)]
return np.piecewise(x, condlist, funclist)
p , e = optimize.curve_fit(piecewise_linear, x, y)
xd = np.linspace(-30, 30, 1000)
plt.plot(x, y, "o")
plt.plot(xd, piecewise_linear(xd, *p))
回答by Charles Jekel
You can use pwlf to perform continuous piecewise linear regression in Python. This library can be installed using pip.
您可以使用 pwlf 在 Python 中执行连续分段线性回归。这个库可以使用 pip 安装。
There are two approaches in pwlf to perform your fit:
pwlf 中有两种方法可以执行您的拟合:
- You can fit for a specified number of line segments.
- You can specify the x locations where the continuous piecewise lines should terminate.
- 您可以拟合指定数量的线段。
- 您可以指定连续分段线应终止的 x 位置。
Let's go with approach 1 since it's easier, and will recognize the 'gradient change point' that you are interested in.
让我们使用方法 1,因为它更容易,并且会识别您感兴趣的“梯度变化点”。
I notice two distinct regions when looking at the data. Thus it makes sense to find the best possible continuous piecewise line using two line segments. This is approach 1.
在查看数据时,我注意到两个不同的区域。因此,使用两条线段找到可能的最佳连续分段线是有意义的。这是方法一。
import numpy as np
import matplotlib.pyplot as plt
import pwlf
x = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15])
y = np.array([5, 7, 9, 11, 13, 15, 28.92, 42.81, 56.7, 70.59,
84.47, 98.36, 112.25, 126.14, 140.03])
my_pwlf = pwlf.PiecewiseLinFit(x, y)
breaks = my_pwlf.fit(2)
print(breaks)
[ 1. 5.99819559 15. ]
[ 1. 5.99819559 15. ]
The first line segment runs from [1., 5.99819559], while the second line segment runs from [5.99819559, 15.]. Thus the gradient change point you asked for would be 5.99819559.
第一条线段从 [1., 5.99819559] 开始,而第二条线段从 [5.99819559, 15.] 开始。因此,您要求的梯度变化点为 5.99819559。
We can plot these results using the predict function.
我们可以使用 predict 函数绘制这些结果。
x_hat = np.linspace(x.min(), x.max(), 100)
y_hat = my_pwlf.predict(x_hat)
plt.figure()
plt.plot(x, y, 'o')
plt.plot(x_hat, y_hat, '-')
plt.show()
回答by Kevin Zhu
piecewiseworks too
分段也有效
from piecewise.regressor import piecewise
import numpy as np
x = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10 ,11, 12, 13, 14, 15,16,17,18], dtype=float)
y = np.array([5, 7, 9, 11, 13, 15, 28.92, 42.81, 56.7, 70.59, 84.47, 98.36, 112.25, 126.14, 140.03,120,112,110])
model = piecewise(x, y)
Evaluate 'model':
评估“模型”:
FittedModel with segments:
* FittedSegment(start_t=1.0, end_t=7.0, coeffs=(2.9999999999999996, 2.0000000000000004))
* FittedSegment(start_t=7.0, end_t=16.0, coeffs=(-68.2972222222222, 13.888333333333332))
* FittedSegment(start_t=16.0, end_t=18.0, coeffs=(198.99999999999997, -5.000000000000001))
回答by Markus Dutschke
This approach uses Scikit-Learn
to apply segmented linear regression.
You can use this, if your points are are subject to noise.
It is way faster, significantly more robustand more genericthan performing a giant optimization task (anything from scip.optimize
like curve_fit
with more then 3 parameters).
这种方法用于Scikit-Learn
应用分段线性回归。如果您的点受到噪音的影响,您可以使用它。它比执行一个巨大的优化任务(具有超过 3 个参数的任何类似任务)更快、更健壮、更通用。scip.optimize
curve_fit
import numpy as np
import matplotlib.pylab as plt
from sklearn.tree import DecisionTreeRegressor
from sklearn.linear_model import LinearRegression
# parameters for setup
n_data = 20
# segmented linear regression parameters
n_seg = 3
np.random.seed(0)
fig, (ax0, ax1) = plt.subplots(1, 2)
# example 1
#xs = np.sort(np.random.rand(n_data))
#ys = np.random.rand(n_data) * .3 + np.tanh(5* (xs -.5))
# example 2
xs = np.linspace(-1, 1, 20)
ys = np.random.rand(n_data) * .3 + np.tanh(3*xs)
dys = np.gradient(ys, xs)
rgr = DecisionTreeRegressor(max_leaf_nodes=n_seg)
rgr.fit(xs.reshape(-1, 1), dys.reshape(-1, 1))
dys_dt = rgr.predict(xs.reshape(-1, 1)).flatten()
ys_sl = np.ones(len(xs)) * np.nan
for y in np.unique(dys_dt):
msk = dys_dt == y
lin_reg = LinearRegression()
lin_reg.fit(xs[msk].reshape(-1, 1), ys[msk].reshape(-1, 1))
ys_sl[msk] = lin_reg.predict(xs[msk].reshape(-1, 1)).flatten()
ax0.plot([xs[msk][0], xs[msk][-1]],
[ys_sl[msk][0], ys_sl[msk][-1]],
color='r', zorder=1)
ax0.set_title('values')
ax0.scatter(xs, ys, label='data')
ax0.scatter(xs, ys_sl, s=3**2, label='seg lin reg', color='g', zorder=5)
ax0.legend()
ax1.set_title('slope')
ax1.scatter(xs, dys, label='data')
ax1.scatter(xs, dys_dt, label='DecisionTree', s=2**2)
ax1.legend()
plt.show()