如何在 Python 中应用分段线性拟合？

Question

提问by Tom Kurushingal

I am trying to fit piecewise linear fit as shown in fig.1 for a data set

我正在尝试拟合数据集的分段线性拟合，如图 1 所示

enter image description here

在此处输入图片说明

This figure was obtained by setting on the lines. I attempted to apply a piecewise linear fit using the code:

这个数字是通过设置在线获得的。我尝试使用以下代码应用分段线性拟合：

from scipy import optimize
import matplotlib.pyplot as plt
import numpy as np


x = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10 ,11, 12, 13, 14, 15])
y = np.array([5, 7, 9, 11, 13, 15, 28.92, 42.81, 56.7, 70.59, 84.47, 98.36, 112.25, 126.14, 140.03])


def linear_fit(x, a, b):
    return a * x + b
fit_a, fit_b = optimize.curve_fit(linear_fit, x[0:5], y[0:5])[0]
y_fit = fit_a * x[0:7] + fit_b
fit_a, fit_b = optimize.curve_fit(linear_fit, x[6:14], y[6:14])[0]
y_fit = np.append(y_fit, fit_a * x[6:14] + fit_b)


figure = plt.figure(figsize=(5.15, 5.15))
figure.clf()
plot = plt.subplot(111)
ax1 = plt.gca()
plot.plot(x, y, linestyle = '', linewidth = 0.25, markeredgecolor='none', marker = 'o', label = r'\textit{y_a}')
plot.plot(x, y_fit, linestyle = ':', linewidth = 0.25, markeredgecolor='none', marker = '', label = r'\textit{y_b}')
plot.set_ylabel('Y', labelpad = 6)
plot.set_xlabel('X', labelpad = 6)
figure.savefig('test.pdf', box_inches='tight')
plt.close()

But this gave me fitting of the form in fig. 2, I tried playing with the values but no change I can't get the fit of the upper line proper. The most important requirement for me is how can I get Python to get the gradient change point. In essence I want Python to recognize and fit two linear fits in the appropriate range. How can this be done in Python?

但这给了我图 1 中的形式的拟合。2，我尝试使用这些值但没有变化我无法正确地拟合上线。对我来说最重要的需求是如何让Python获得梯度变化点。本质上，我希望 Python 能够识别并拟合适当范围内的两个线性拟合。这如何在 Python 中完成？

enter image description here

在此处输入图片说明

Answer 1

采纳答案by HYRY

You can use numpy.piecewise()to create the piecewise function and then use curve_fit(), Here is the code

您可以使用numpy.piecewise()创建分段函数然后使用curve_fit()，这是代码

from scipy import optimize
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline

x = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10 ,11, 12, 13, 14, 15], dtype=float)
y = np.array([5, 7, 9, 11, 13, 15, 28.92, 42.81, 56.7, 70.59, 84.47, 98.36, 112.25, 126.14, 140.03])

def piecewise_linear(x, x0, y0, k1, k2):
    return np.piecewise(x, [x < x0], [lambda x:k1*x + y0-k1*x0, lambda x:k2*x + y0-k2*x0])

p , e = optimize.curve_fit(piecewise_linear, x, y)
xd = np.linspace(0, 15, 100)
plt.plot(x, y, "o")
plt.plot(xd, piecewise_linear(xd, *p))

the output:

输出：

enter image description here

在此处输入图片说明

Answer 2

回答by hakanc

You could do a spline interpolationscheme to both perform piecewise linear interpolation and find the turning point of the curve. The second derivative will be the highest at the turning point (for an monotonically increasing curve), and can be calculated with a spline interpolation of order > 2.

您可以使用样条插值方案来执行分段线性插值并找到曲线的转折点。二阶导数将在转折点处最高（对于单调递增的曲线），并且可以使用阶数 > 2 的样条插值计算。

import numpy as np
import matplotlib.pyplot as plt
from scipy import interpolate

x = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10 ,11, 12, 13, 14, 15])
y = np.array([5, 7, 9, 11, 13, 15, 28.92, 42.81, 56.7, 70.59, 84.47, 98.36, 112.25, 126.14, 140.03])

tck = interpolate.splrep(x, y, k=2, s=0)
xnew = np.linspace(0, 15)

fig, axes = plt.subplots(3)

axes[0].plot(x, y, 'x', label = 'data')
axes[0].plot(xnew, interpolate.splev(xnew, tck, der=0), label = 'Fit')
axes[1].plot(x, interpolate.splev(x, tck, der=1), label = '1st dev')
dev_2 = interpolate.splev(x, tck, der=2)
axes[2].plot(x, dev_2, label = '2st dev')

turning_point_mask = dev_2 == np.amax(dev_2)
axes[2].plot(x[turning_point_mask], dev_2[turning_point_mask],'rx',
             label = 'Turning point')
for ax in axes:
    ax.legend(loc = 'best')

plt.show()

Turning point and piecewise linear interpolation

转折点和分段线性插值

Answer 3

回答by Binoy Pilakkat

Use numpy.interpwhich returns the one-dimensional piecewise linear interpolant to a function with given values at discrete data-points.

使用numpy.interp将一维分段线性插值返回给具有离散数据点给定值的函数。

Answer 4

回答by vhcandido

Extending @binoy-pilakkat's answer.

扩展@binoy-pilakkat 的回答。

You should use numpy.interp:

你应该使用numpy.interp：

import numpy as np
import matplotlib.pyplot as plt

x = np.array(range(1,16), dtype=float)
y = np.array([5, 7, 9, 11, 13, 15, 28.92,
          42.81, 56.7, 70.59, 84.47,
          98.36, 112.25, 126.14, 140.03], dtype=float)

yinterp = np.interp(x, x, y) # simple as that

plt.plot(x, y, 'bo')
plt.plot(x, yinterp, 'g-')
plt.show()

Answer 5

回答by pinseng

An example for two change points. If you want, just test more change points based on this example.

两个变化点的示例。如果需要，只需根据此示例测试更多更改点。

np.random.seed(9999)
x = np.random.normal(0, 1, 1000) * 10
y = np.where(x < -15, -2 * x + 3 , np.where(x < 10, x + 48, -4 * x + 98)) + np.random.normal(0, 3, 1000)
plt.scatter(x, y, s = 5, color = u'b', marker = '.', label = 'scatter plt')

def piecewise_linear(x, x0, x1, b, k1, k2, k3):
    condlist = [x < x0, (x >= x0) & (x < x1), x >= x1]
    funclist = [lambda x: k1*x + b, lambda x: k1*x + b + k2*(x-x0), lambda x: k1*x + b + k2*(x-x0) + k3*(x - x1)]
    return np.piecewise(x, condlist, funclist)

p , e = optimize.curve_fit(piecewise_linear, x, y)
xd = np.linspace(-30, 30, 1000)
plt.plot(x, y, "o")
plt.plot(xd, piecewise_linear(xd, *p))

Answer 6

回答by Charles Jekel

You can use pwlf to perform continuous piecewise linear regression in Python. This library can be installed using pip.

您可以使用 pwlf 在 Python 中执行连续分段线性回归。这个库可以使用 pip 安装。

There are two approaches in pwlf to perform your fit:

pwlf 中有两种方法可以执行您的拟合：

You can fit for a specified number of line segments.
You can specify the x locations where the continuous piecewise lines should terminate.

您可以拟合指定数量的线段。
您可以指定连续分段线应终止的 x 位置。

Let's go with approach 1 since it's easier, and will recognize the 'gradient change point' that you are interested in.

让我们使用方法 1，因为它更容易，并且会识别您感兴趣的“梯度变化点”。

I notice two distinct regions when looking at the data. Thus it makes sense to find the best possible continuous piecewise line using two line segments. This is approach 1.

在查看数据时，我注意到两个不同的区域。因此，使用两条线段找到可能的最佳连续分段线是有意义的。这是方法一。

import numpy as np
import matplotlib.pyplot as plt
import pwlf

x = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15])
y = np.array([5, 7, 9, 11, 13, 15, 28.92, 42.81, 56.7, 70.59,
              84.47, 98.36, 112.25, 126.14, 140.03])

my_pwlf = pwlf.PiecewiseLinFit(x, y)
breaks = my_pwlf.fit(2)
print(breaks)

[ 1. 5.99819559 15. ]

The first line segment runs from [1., 5.99819559], while the second line segment runs from [5.99819559, 15.]. Thus the gradient change point you asked for would be 5.99819559.

第一条线段从 [1., 5.99819559] 开始，而第二条线段从 [5.99819559, 15.] 开始。因此，您要求的梯度变化点为 5.99819559。

We can plot these results using the predict function.

我们可以使用 predict 函数绘制这些结果。

x_hat = np.linspace(x.min(), x.max(), 100)
y_hat = my_pwlf.predict(x_hat)

plt.figure()
plt.plot(x, y, 'o')
plt.plot(x_hat, y_hat, '-')
plt.show()

Answer 7

回答by Kevin Zhu

piecewiseworks too

分段也有效

from piecewise.regressor import piecewise
import numpy as np

x = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10 ,11, 12, 13, 14, 15,16,17,18], dtype=float)
y = np.array([5, 7, 9, 11, 13, 15, 28.92, 42.81, 56.7, 70.59, 84.47, 98.36, 112.25, 126.14, 140.03,120,112,110])

model = piecewise(x, y)

Evaluate 'model':

评估“模型”：

FittedModel with segments:
* FittedSegment(start_t=1.0, end_t=7.0, coeffs=(2.9999999999999996, 2.0000000000000004))
* FittedSegment(start_t=7.0, end_t=16.0, coeffs=(-68.2972222222222, 13.888333333333332))
* FittedSegment(start_t=16.0, end_t=18.0, coeffs=(198.99999999999997, -5.000000000000001))

Answer 8

回答by Markus Dutschke

This approach uses Scikit-Learnto apply segmented linear regression. You can use this, if your points are are subject to noise. It is way faster, significantly more robustand more genericthan performing a giant optimization task (anything from scip.optimizelike curve_fitwith more then 3 parameters).

这种方法用于Scikit-Learn应用分段线性回归。如果您的点受到噪音的影响，您可以使用它。它比执行一个巨大的优化任务（具有超过 3 个参数的任何类似任务）更快、更健壮、更通用。scip.optimizecurve_fit

import numpy as np
import matplotlib.pylab as plt
from sklearn.tree import DecisionTreeRegressor
from sklearn.linear_model import LinearRegression

# parameters for setup
n_data = 20

# segmented linear regression parameters
n_seg = 3

np.random.seed(0)
fig, (ax0, ax1) = plt.subplots(1, 2)

# example 1
#xs = np.sort(np.random.rand(n_data))
#ys = np.random.rand(n_data) * .3 + np.tanh(5* (xs -.5))

# example 2
xs = np.linspace(-1, 1, 20)
ys = np.random.rand(n_data) * .3 + np.tanh(3*xs)

dys = np.gradient(ys, xs)

rgr = DecisionTreeRegressor(max_leaf_nodes=n_seg)
rgr.fit(xs.reshape(-1, 1), dys.reshape(-1, 1))
dys_dt = rgr.predict(xs.reshape(-1, 1)).flatten()

ys_sl = np.ones(len(xs)) * np.nan
for y in np.unique(dys_dt):
    msk = dys_dt == y
    lin_reg = LinearRegression()
    lin_reg.fit(xs[msk].reshape(-1, 1), ys[msk].reshape(-1, 1))
    ys_sl[msk] = lin_reg.predict(xs[msk].reshape(-1, 1)).flatten()
    ax0.plot([xs[msk][0], xs[msk][-1]],
             [ys_sl[msk][0], ys_sl[msk][-1]],
             color='r', zorder=1)

ax0.set_title('values')
ax0.scatter(xs, ys, label='data')
ax0.scatter(xs, ys_sl, s=3**2, label='seg lin reg', color='g', zorder=5)
ax0.legend()

ax1.set_title('slope')
ax1.scatter(xs, dys, label='data')
ax1.scatter(xs, dys_dt, label='DecisionTree', s=2**2)
ax1.legend()

plt.show()

如何在 Python 中应用分段线性拟合？

提问by Tom Kurushingal

采纳答案by HYRY

回答by hakanc

回答by Binoy Pilakkat

回答by vhcandido

回答by pinseng

回答by Charles Jekel

回答by Kevin Zhu

回答by Markus Dutschke

相关推荐

最近更新

标签

如何在 Python 中应用分段线性拟合？

提问by Tom Kurushingal

采纳答案by HYRY

回答by hakanc

回答by Binoy Pilakkat

回答by vhcandido

回答by pinseng

回答by Charles Jekel

回答by Kevin Zhu

回答by Markus Dutschke

相关推荐

在python中声明一个多维字典

Python pymssql Windows 身份验证

Python 如何向轮子添加其他文件？

Python 发送电子邮件时出错：引发 SMTPAuthenticationError(code, resp)

相关推荐

最近更新

标签