用 numpy/python 外推数据

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/19406049/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 13:39:52  来源:igfitidea点击:

extrapolating data with numpy/python

pythonpython-2.7numpyscipy

提问by corvid

Let's say I have a simple data set. Perhaps in dictionary form, it would look like this:

假设我有一个简单的数据集。也许以字典的形式,它看起来像这样:

{1:5, 2:10, 3:15, 4:20, 5:25}

{1:5, 2:10, 3:15, 4:20, 5:25}

(the order is always ascending). What I want to do is logically figure out what the next point of data is most likely to be. In the case, for example, it would be {6: 30}

(顺序总是升序)。我想要做的是从逻辑上找出下一个数据点最有可能是什么。例如,在这种情况下,它将是{6: 30}

what would be the best way to do this?

什么是最好的方法来做到这一点?

采纳答案by OldTinfoil

After discussing with you in the Python chat, and fitting your data to an exponential. This should give a relatively good indicator since you're not looking for long term extrapolation.

在 Python 聊天中与您讨论后,将您的数据拟合成指数。这应该给出一个相对较好的指标,因为您不是在寻找长期外推法。

import numpy as np
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt

def exponential_fit(x, a, b, c):
    return a*np.exp(-b*x) + c

if __name__ == "__main__":
    x = np.array([0, 1, 2, 3, 4, 5])
    y = np.array([30, 50, 80, 160, 300, 580])
    fitting_parameters, covariance = curve_fit(exponential_fit, x, y)
    a, b, c = fitting_parameters

    next_x = 6
    next_y = exponential_fit(next_x, a, b, c)

    plt.plot(y)
    plt.plot(np.append(y, next_y), 'ro')
    plt.show()

The red dot in the on far right axis shows the next "predicted" point.

最右侧轴上的红点显示下一个“预测”点。

回答by tom10

Since your data is approximately linear you can do a linear regression, and then use the results from that regression to calculate the next point, using y = w[0]*x + w[1](keeping the notation from the linked example for y = mx + b).

由于您的数据近似线性,您可以进行线性回归,然后使用该回归的结果来计算下一个点,使用y = w[0]*x + w[1](保留y = mx + b的链接示例中的符号)。

If your data is not approximately linear and you don't have some other theoretical form for a regression, then general extrapolations (using say polynomials or splines) are much less reliable as they can go a bit crazy beyond the known data points. For example, see the accepted answer here.

如果您的数据不是近似线性的,并且您没有其他一些回归理论形式,那么一般外推法(使用多项式或样条)就不太可靠,因为它们可能会超出已知数据点而变得有点疯狂。例如,请参阅此处接受的答案。

回答by falsetru

Using scipy.interpolate.splrep:

使用scipy.interpolate.splrep

>>> from scipy.interpolate import splrep, splev
>>> d = {1:5, 2:10, 3:15, 4:20, 5:25}
>>> x, y = zip(*d.items())
>>> spl = splrep(x, y, k=1, s=0)
>>> splev(6, spl)
array(30.0)
>>> splev(7, spl)
array(35.0)
>>> int(splev(7, spl))
35
>>> splev(10000000000, spl)
array(50000000000.0)
>>> int(splev(10000000000, spl))
50000000000L

See How to make scipy.interpolate give an extrapolated result beyond the input range?

请参阅如何使 scipy.interpolate 给出超出输入范围的外推结果?

回答by Daniel

You can also use numpy's polyfit:

您还可以使用 numpy 的polyfit

data = np.array([[1,5], [2,10], [3,15], [4,20], [5,25]])
fit = np.polyfit(data[:,0], data[:,1] ,1) #The use of 1 signifies a linear fit.

fit
[  5.00000000e+00   1.58882186e-15]  #y = 5x + 0

line = np.poly1d(fit)
new_points = np.arange(5)+6

new_points
[ 6, 7, 8, 9, 10]

line(new_points)
[ 30.  35.  40.  45.  50.]

This allows you to alter the degree of the polynomial fit quite easily as the function polyfittake thes following arguments np.polyfit(x data, y data, degree). Shown is a linear fit where the returned array looks like fit[0]*x^n + fit[1]*x^(n-1) + ... + fit[n-1]*x^0for any degree n. The poly1dfunction allows you turn this array into a function that returns the value of the polynomial at any given value x.

这使您可以很容易地改变多项式拟合的程度,因为该函数polyfit采用以下参数np.polyfit(x data, y data, degree)。显示的是一个线性拟合,其中返回的数组看起来像fit[0]*x^n + fit[1]*x^(n-1) + ... + fit[n-1]*x^0任何度数n。该poly1d函数允许您将此数组转换为一个函数,该函数返回任何给定值的多项式值x

In general extrapolation without a well understood model will have sporadic results at best.

一般来说,没有很好理解的模型的外推最多只能产生零星的结果。



Exponential curve fitting.

指数曲线拟合

from scipy.optimize import curve_fit

def func(x, a, b, c):
    return a * np.exp(-b * x) + c

x = np.linspace(0,4,5)
y = func(x, 2.5, 1.3, 0.5)
yn = y + 0.2*np.random.normal(size=len(x))

fit ,cov = curve_fit(func, x, yn)
fit
[ 2.67217435  1.21470107  0.52942728]         #Variables

y
[ 3.          1.18132948  0.68568395  0.55060478  0.51379141]  #Original data

func(x,*fit)
[ 3.20160163  1.32252521  0.76481773  0.59929086  0.5501627 ]  #Fit to original + noise

回答by Noyer282

As pointed out by this answerto a related question, as of version 0.17.0 of scipy, there is an option in scipy.interpolate.interp1dthat allows linear extrapolation. In your case, you could do:

正如指出的这个回答一个相关问题,由于SciPy的的0.17.0或更新的版本,有一个选项scipy.interpolate.interp1d,使线性外推。在你的情况下,你可以这样做:

>>> import numpy as np
>>> from scipy import interpolate

>>> x = [1, 2, 3, 4, 5]
>>> y = [5, 10, 15, 20, 25]
>>> f = interpolate.interp1d(x, y, fill_value = "extrapolate")
>>> print(f(6))
30.0