python python中有没有简单的方法可以将数据点外推到未来?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1599754/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Is there easy way in python to extrapolate data points to the future?
提问by maplpro
I have a simple numpy array, for every date there is a data point. Something like this:
我有一个简单的 numpy 数组,每个日期都有一个数据点。像这样的东西:
>>> import numpy as np
>>> from datetime import date
>>> from datetime import date
>>> x = np.array( [(date(2008,3,5), 4800 ), (date(2008,3,15), 4000 ), (date(2008,3,
20), 3500 ), (date(2008,4,5), 3000 ) ] )
Is there easy way to extrapolate data points to the future: date(2008,5,1), date(2008, 5, 20) etc? I understand it can be done with mathematical algorithms. But here I am seeking for some low hanging fruit. Actually I like what numpy.linalg.solve does, but it does not look applicable for the extrapolation. Maybe I am absolutely wrong.
是否有简单的方法将数据点外推到未来:日期(2008,5,1),日期(2008,5,20)等?我知道它可以用数学算法来完成。但在这里,我正在寻找一些悬而未决的果实。实际上我喜欢 numpy.linalg.solve 所做的,但它看起来不适用于外推。也许我完全错了。
Actually to be more specific I am building a burn-down chart (xp term): 'x=date and y=volume of work to be done', so I have got the already done sprints and I want to visualise how the future sprints will go if the current situation persists. And finally I want to predict the release date. So the nature of 'volume of work to be done' is it always goes down on burn-down charts. Also I want to get the extrapolated release date: date when the volume becomes zero.
实际上,更具体地说,我正在构建一个燃尽图(xp 术语):“x=date and y=volume of work to be done”,所以我已经完成了 sprints,我想想象一下未来的 sprints如果目前的情况持续下去,就会去。最后我想预测发布日期。因此,“要完成的工作量”的本质是它总是在燃尽图上下降。我还想获得推断的发布日期:音量变为零的日期。
This is all for showing to dev team how things go. The preciseness is not so important here :) The motivation of dev team is the main factor. That means I am absolutely fine with the very approximate extrapolation technique.
这一切都是为了向开发团队展示事情的进展。精确性在这里不是那么重要:) 开发团队的动机是主要因素。这意味着我对非常近似的外推技术完全没问题。
回答by denis
It's all too easy for extrapolation to generate garbage; try this. Many different extrapolations are of course possible; some produce obvious garbage, some non-obvious garbage, many are ill-defined.
外推法很容易产生垃圾;试试这个。许多不同的外推当然是可能的;一些产生明显的垃圾,一些不明显的垃圾,许多是不明确的。
""" extrapolate y,m,d data with scipy UnivariateSpline """
import numpy as np
from scipy.interpolate import UnivariateSpline
# pydoc scipy.interpolate.UnivariateSpline -- fitpack, unclear
from datetime import date
from pylab import * # ipython -pylab
__version__ = "denis 23oct"
def daynumber( y,m,d ):
""" 2005,1,1 -> 0 2006,1,1 -> 365 ... """
return date( y,m,d ).toordinal() - date( 2005,1,1 ).toordinal()
days, values = np.array([
(daynumber(2005,1,1), 1.2 ),
(daynumber(2005,4,1), 1.8 ),
(daynumber(2005,9,1), 5.3 ),
(daynumber(2005,10,1), 5.3 )
]).T
dayswanted = np.array([ daynumber( year, month, 1 )
for year in range( 2005, 2006+1 )
for month in range( 1, 12+1 )])
np.set_printoptions( 1 ) # .1f
print "days:", days
print "values:", values
print "dayswanted:", dayswanted
title( "extrapolation with scipy.interpolate.UnivariateSpline" )
plot( days, values, "o" )
for k in (1,2,3): # line parabola cubicspline
extrapolator = UnivariateSpline( days, values, k=k )
y = extrapolator( dayswanted )
label = "k=%d" % k
print label, y
plot( dayswanted, y, label=label ) # pylab
legend( loc="lower left" )
grid(True)
savefig( "extrapolate-UnivariateSpline.png", dpi=50 )
show()
Added: a Scipy ticketsays, "The behavior of the FITPACK classes in scipy.interpolate is much more complex than the docs would lead one to believe" -- imho true of other software doc too.
添加:Scipy 票证说,“ scipy.interpolate 中 FITPACK 类的行为比文档要复杂得多” - 其他软件文档也是如此。
回答by Eric O Lebigot
A simple way of doing extrapolations is to use interpolating polynomials or splines: there are many routines for this in scipy.interpolate, and there are quite easy to use (just give the (x, y) points, and you get a function [a callable, precisely]).
进行外推的一种简单方法是使用插值多项式或样条:在scipy.interpolate 中有许多用于此的例程,并且非常易于使用(只需给出 (x, y) 点,您就会得到一个函数 [a可调用,准确])。
Now, as as been pointed in this thread, you cannot expect the extrapolation to be always meaningful (especially when you are far from your data points) if you don't have a model for your data. However, I encourage you to play with the polynomial or spline interpolations from scipy.interpolate to see whether the results you obtain suit you.
现在,正如本主题中所指出的,如果您没有数据模型,则不能指望外推总是有意义的(尤其是当您远离数据点时)。但是,我鼓励您使用 scipy.interpolate 中的多项式或样条插值来查看您获得的结果是否适合您。
回答by ty812
The mathematical models are the way to go in this case. For instance, if you have only three data points, you can have absolutely no indication on how the trend will unfold (could be any of two parabola.)
在这种情况下,数学模型是可行的方法。例如,如果您只有三个数据点,则您完全无法了解趋势将如何展开(可能是两条抛物线中的任何一条。)
Get some statistics courses and try to implement the algorithms. Try Wikibooks.
学习一些统计学课程并尝试实现算法。试试维基教科书。
回答by Luka Rahne
You have to swpecify over which function you need extrapolation. Than you can use regression http://en.wikipedia.org/wiki/Regression_analysisto find paratmeters of function. And extrapolate this in future.
您必须指定需要外推的函数。您可以使用回归http://en.wikipedia.org/wiki/Regression_analysis来查找函数的参数。并在未来推断这一点。
For instance: translate dates into x values and use first day as x=0 for your problem the values shoul be aproximatly (0,1.2), (400,1.8),(900,5.3)
例如:将日期转换为 x 值并使用第一天作为 x=0 来解决您的问题,这些值应该近似为 (0,1.2)、(400,1.8)、(900,5.3)
Now you decide that his points lies on function of type a+bx+cx^2
现在你决定他的点在于 a+b x+cx^2类型的函数
Use the method of least squers to find a,b and c http://en.wikipedia.org/wiki/Linear_least_squares(i will provide full source, but later, beacuase I do not have time for this)
使用最小平方的方法找到 a、b 和 c http://en.wikipedia.org/wiki/Linear_least_squares(我将提供完整的源代码,但稍后,因为我没有时间这样做)