pandas 使用局部加权回归(LOESS/LOWESS)预测新数据
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/36252434/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Predicting on new data using locally weighted regression (LOESS/LOWESS)
提问by max
How to fit a locally weighted regression in python so that it can be used to predict on new data?
如何在 python 中拟合局部加权回归,以便它可以用于预测新数据?
There is statsmodels.nonparametric.smoothers_lowess.lowess
, but it returns the estimates only for the original data set; so it seems to only do fit
and predict
together, rather than separately as I expected.
有statsmodels.nonparametric.smoothers_lowess.lowess
,但它只返回原始数据集的估计值;如此看来只做fit
和predict
在一起,而不是单独作为我的预期。
scikit-learn
always has a fit
method that allows the object to be used later on new data with predict
; but it doesn't implement lowess
.
scikit-learn
总是有一个fit
方法,允许对象稍后在新数据上使用predict
;但它没有实现lowess
.
回答by Daniel Hitchcock
Lowess works great for predicting (when combined with interpolation)! I think the code is pretty straightforward-- let me know if you have any questions! Matplolib Figure
Lowess 非常适合预测(与插值结合使用时)!我认为代码非常简单——如果您有任何问题,请告诉我! Matplolib 图
import matplotlib.pyplot as plt
%matplotlib inline
from scipy.interpolate import interp1d
import statsmodels.api as sm
# introduce some floats in our x-values
x = list(range(3, 33)) + [3.2, 6.2]
y = [1,2,1,2,1,1,3,4,5,4,5,6,5,6,7,8,9,10,11,11,12,11,11,10,12,11,11,10,9,8,2,13]
# lowess will return our "smoothed" data with a y value for at every x-value
lowess = sm.nonparametric.lowess(y, x, frac=.3)
# unpack the lowess smoothed points to their values
lowess_x = list(zip(*lowess))[0]
lowess_y = list(zip(*lowess))[1]
# run scipy's interpolation. There is also extrapolation I believe
f = interp1d(lowess_x, lowess_y, bounds_error=False)
xnew = [i/10. for i in range(400)]
# this this generate y values for our xvalues by our interpolator
# it will MISS values outsite of the x window (less than 3, greater than 33)
# There might be a better approach, but you can run a for loop
#and if the value is out of the range, use f(min(lowess_x)) or f(max(lowess_x))
ynew = f(xnew)
plt.plot(x, y, 'o')
plt.plot(lowess_x, lowess_y, '*')
plt.plot(xnew, ynew, '-')
plt.show()
回答by David R
Consider using Kernel Regression instead.
考虑改用核回归。
statmodels has an implementation.
statmodels 有一个实现。
If you have too many data points, why not use sk.learn's radiusNeighborRegressionand specify a tricube weighting function?
如果你有太多的数据点,为什么不使用 sk.learn 的radiusNeighborRegression并指定一个 tricube 权重函数?
回答by Sarah
I would use SAS PROC LOESS, and then use PROC SCORE to make prediction. Or I would use R. Python is great and fantastic for tons of other stuff. But it is not fully developed for statistical analysis.
我会使用 SAS PROC LOESS,然后使用 PROC SCORE 进行预测。或者我会使用 R。Python 对于大量其他东西来说非常棒。但它并没有完全开发用于统计分析。