auto.arima() 等效于 python

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/22770352/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 01:38:28  来源:igfitidea点击:

auto.arima() equivalent for python

pythonrtime-seriesstatsmodelsforecasting

提问by Ajax

I am trying to predict weekly sales using ARMA ARIMA models. I could not find a function for tuning the order(p,d,q) in statsmodels. Currently R has a function forecast::auto.arima()which will tune the (p,d,q) parameters.

我正在尝试使用ARMAARIMA 模型预测每周销售额。我在statsmodels. 目前 R 有一个函数forecast::auto.arima()可以调整 (p,d,q) 参数。

How do I go about choosing the right order for my model? Are there any libraries available in python for this purpose?

我如何为我的模型选择正确的顺序?为此,python 中是否有可用的库?

采纳答案by behzad.nouri

You can implement a number of approaches:

您可以实施多种方法:

  1. ARIMAResultsinclude aicand bic. By their definition, (see hereand here), these criteria penalize for the number of parameters in the model. So you may use these numbers to compare the models. Also scipy has optimize.brutewhich does grid search on the specified parameters space. So a workflow like this should work:

    def objfunc(order, exog, endog):
        from statsmodels.tsa.arima_model import ARIMA
        fit = ARIMA(endog, order, exog).fit()
        return fit.aic()
    
    from scipy.optimize import brute
    grid = (slice(1, 3, 1), slice(1, 3, 1), slice(1, 3, 1))
    brute(objfunc, grid, args=(exog, endog), finish=None)
    

    Make sure you call brutewith finish=None.

  2. You may obtain pvaluesfrom ARIMAResults. So a sort of step-forward algorithm is easy to implement where the degree of the model is increased across the dimension which obtains lowest p-value for the added parameter.

  3. Use ARIMAResults.predictto cross-validate alternative models. The best approach would be to keep the tail of the time series (say most recent 5% of data) out of sample, and use these points to obtain the test errorof the fitted models.

  1. ARIMAResults包括aicbic。根据它们的定义(参见此处此处),这些标准会惩罚模型中的参数数量。因此,您可以使用这些数字来比较模型。scipy 还optimize.brute可以在指定的参数空间上进行网格搜索。所以像这样的工作流程应该可以工作:

    def objfunc(order, exog, endog):
        from statsmodels.tsa.arima_model import ARIMA
        fit = ARIMA(endog, order, exog).fit()
        return fit.aic()
    
    from scipy.optimize import brute
    grid = (slice(1, 3, 1), slice(1, 3, 1), slice(1, 3, 1))
    brute(objfunc, grid, args=(exog, endog), finish=None)
    

    确保您brute使用finish=None.

  2. 您可以pvaluesARIMAResults. 因此,一种渐进算法很容易实现,其中模型的度数在维度上增加,从而获得添加参数的最低 p 值。

  3. 使用ARIMAResults.predict交叉验证的替代机型。最好的方法是将时间序列的尾部(比如最近 5% 的数据)保留在样本之外,并使用这些点来获得拟合模型的测试误差

回答by LUO

actually

实际上

def objfunc(order,*params ):    
    from statsmodels.tsa.arima_model import ARIMA   
    p,d,q = order   
    fit = ARIMA(endog, order, exog).fit()  
    return fit.aic()    
from scipy.optimize import brute
grid = (slice(1, 3, 1), slice(1, 3, 1), slice(1, 3, 1))
brute(objfunc, grid, args=params, finish=None)

回答by prateek khandelwal

I wrote these utility functions to directly calculate pdq values get_PDQ_parallelrequire three inputs data which is series with timestamp(datetime) as index. n_jobs will provide number of parallel processor. output will be dataframe with aic and bic value with order=(P,D,Q) in index p and q range is [0,12] while d is [0,1]

我编写了这些实用程序函数来直接计算 pdq 值 get_PDQ_parallel需要三个输入数据,这些数据以时间戳(日期时间)为索引。n_jobs 将提供并行处理器的数量。输出将是具有 aic 和 bic 值的数据帧,索引 p 中的 order=(P,D,Q) 和 q 范围是 [0,12] 而 d 是 [0,1]

import statsmodels 
from statsmodels import api as sm
from sklearn.metrics import r2_score,mean_squared_error
from sklearn.utils import check_array
from functools import partial
from multiprocessing import Pool
def get_aic_bic(order,series):
    aic=np.nan
    bic=np.nan
    #print(series.shape,order)
    try:
        arima_mod=statsmodels.tsa.arima_model.ARIMA(series,order=order,freq='H').fit(transparams=True,method='css')
        aic=arima_mod.aic
        bic=arima_mod.bic
        print(order,aic,bic)
    except:
        pass
    return aic,bic

def get_PDQ_parallel(data,n_jobs=7):
    p_val=13
    q_val=13
    d_vals=2
    pdq_vals=[ (p,d,q) for p in range(p_val) for d in range(d_vals) for q in range(q_val)]
    get_aic_bic_partial=partial(get_aic_bic,series=data)
    p = Pool(n_jobs)
    res=p.map(get_aic_bic_partial, pdq_vals)  
    p.close()
    return pd.DataFrame(res,index=pdq_vals,columns=['aic','bic']) 

回答by Ajay Ohri

possible solution

可能的解决方案

df=pd.read_csv("http://vincentarelbundock.github.io/Rdatasets/csv/datasets/AirPassengers.csv")

# Define the p, d and q parameters to take any value between 0 and 2
p = d = q = range(0, 2)
print(p)


import itertools
import warnings

# Generate all different combinations of p, q and q triplets
pdq = list(itertools.product(p, d, q))
print(pdq)

# Generate all different combinations of seasonal p, q and q triplets
seasonal_pdq = [(x[0], x[1], x[2], 12) for x in list(itertools.product(p, d, q))]

print('Examples of parameter combinations for Seasonal ARIMA...')
print('SARIMAX: {} x {}'.format(pdq[1], seasonal_pdq[1]))
print('SARIMAX: {} x {}'.format(pdq[1], seasonal_pdq[2]))
print('SARIMAX: {} x {}'.format(pdq[2], seasonal_pdq[3]))
print('SARIMAX: {} x {}'.format(pdq[2], seasonal_pdq[4]))
Examples of parameter combinations for Seasonal ARIMA...
SARIMAX: (0, 0, 1) x (0, 0, 1, 12)
SARIMAX: (0, 0, 1) x (0, 1, 0, 12)
SARIMAX: (0, 1, 0) x (0, 1, 1, 12)
SARIMAX: (0, 1, 0) x (1, 0, 0, 12)

y=df

#warnings.filterwarnings("ignore") # specify to ignore warning messages

for param in pdq:
    for param_seasonal in seasonal_pdq:
        try:
            mod = sm.tsa.statespace.SARIMAX(y,
                                            order=param,
                                            seasonal_order=param_seasonal,
                                            enforce_stationarity=False,
                                            enforce_invertibility=False)

            results = mod.fit()

            print('ARIMA{}x{}12 - AIC:{}'.format(param, param_seasonal, results.aic))
        except:
            continue
ARIMA(0, 0, 0)x(0, 0, 1, 12)12 - AIC:3618.0303991426763
ARIMA(0, 0, 0)x(0, 1, 1, 12)12 - AIC:2824.7439963684233
ARIMA(0, 0, 0)x(1, 0, 0, 12)12 - AIC:2942.2733127230185
ARIMA(0, 0, 0)x(1, 0, 1, 12)12 - AIC:2922.178151133141
ARIMA(0, 0, 0)x(1, 1, 0, 12)12 - AIC:2767.105066400224
ARIMA(0, 0, 0)x(1, 1, 1, 12)12 - AIC:2691.233398643673
ARIMA(0, 0, 1)x(0, 0, 0, 12)12 - AIC:3890.816777796087
ARIMA(0, 0, 1)x(0, 0, 1, 12)12 - AIC:3541.1171286722
ARIMA(0, 0, 1)x(0, 1, 0, 12)12 - AIC:3028.8377323188824
ARIMA(0, 0, 1)x(0, 1, 1, 12)12 - AIC:2746.77973129136
ARIMA(0, 0, 1)x(1, 0, 0, 12)12 - AIC:3583.523640623017
ARIMA(0, 0, 1)x(1, 0, 1, 12)12 - AIC:3531.2937768990187
ARIMA(0, 0, 1)x(1, 1, 0, 12)12 - AIC:2781.198675746594
ARIMA(0, 0, 1)x(1, 1, 1, 12)12 - AIC:2720.7023088205974
ARIMA(0, 1, 0)x(0, 0, 1, 12)12 - AIC:3029.089945668332
ARIMA(0, 1, 0)x(0, 1, 1, 12)12 - AIC:2568.2832251221016
ARIMA(0, 1, 0)x(1, 0, 0, 12)12 - AIC:2841.315781459511
ARIMA(0, 1, 0)x(1, 0, 1, 12)12 - AIC:2815.4011044132576
ARIMA(0, 1, 0)x(1, 1, 0, 12)12 - AIC:2588.533386513587
ARIMA(0, 1, 0)x(1, 1, 1, 12)12 - AIC:2569.9453272483315
ARIMA(0, 1, 1)x(0, 0, 0, 12)12 - AIC:3327.5177587522303
ARIMA(0, 1, 1)x(0, 0, 1, 12)12 - AIC:2984.716706112334
ARIMA(0, 1, 1)x(0, 1, 0, 12)12 - AIC:2789.128542154043
ARIMA(0, 1, 1)x(0, 1, 1, 12)12 - AIC:2537.0293659293943
ARIMA(0, 1, 1)x(1, 0, 0, 12)12 - AIC:2984.4555708516436
ARIMA(0, 1, 1)x(1, 0, 1, 12)12 - AIC:2939.460958374472
ARIMA(0, 1, 1)x(1, 1, 0, 12)12 - AIC:2578.7862352774437
ARIMA(0, 1, 1)x(1, 1, 1, 12)12 - AIC:2537.771484229265
ARIMA(1, 0, 0)x(0, 0, 0, 12)12 - AIC:3391.5248913820797
ARIMA(1, 0, 0)x(0, 0, 1, 12)12 - AIC:3038.142074281268
C:\Users\Dell\Anaconda3\lib\site-packages\statsmodels\base\model.py:496: ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals
  "Check mle_retvals", ConvergenceWarning)
ARIMA(1, 0, 0)x(0, 1, 0, 12)12 - AIC:2839.809192263449
ARIMA(1, 0, 0)x(0, 1, 1, 12)12 - AIC:2588.50367175184
ARIMA(1, 0, 0)x(1, 0, 0, 12)12 - AIC:2993.4630440139595
ARIMA(1, 0, 0)x(1, 0, 1, 12)12 - AIC:2995.049216326931
ARIMA(1, 0, 0)x(1, 1, 0, 12)12 - AIC:2588.2463284315304
ARIMA(1, 0, 0)x(1, 1, 1, 12)12 - AIC:2592.80110502723
ARIMA(1, 0, 1)x(0, 0, 0, 12)12 - AIC:3352.0350133621478
ARIMA(1, 0, 1)x(0, 0, 1, 12)12 - AIC:3006.5493366627807
ARIMA(1, 0, 1)x(0, 1, 0, 12)12 - AIC:2810.6423724894516
ARIMA(1, 0, 1)x(0, 1, 1, 12)12 - AIC:2559.584031948852
ARIMA(1, 0, 1)x(1, 0, 0, 12)12 - AIC:2981.2250436794675
ARIMA(1, 0, 1)x(1, 0, 1, 12)12 - AIC:2959.3142304724834
ARIMA(1, 0, 1)x(1, 1, 0, 12)12 - AIC:2579.8245645892207
ARIMA(1, 0, 1)x(1, 1, 1, 12)12 - AIC:2563.13922589258
ARIMA(1, 1, 0)x(0, 0, 0, 12)12 - AIC:3354.7462930846423
ARIMA(1, 1, 0)x(0, 0, 1, 12)12 - AIC:3006.702997636003
ARIMA(1, 1, 0)x(0, 1, 0, 12)12 - AIC:2809.3844175191666
ARIMA(1, 1, 0)x(0, 1, 1, 12)12 - AIC:2558.484602766447
ARIMA(1, 1, 0)x(1, 0, 0, 12)12 - AIC:2959.885810636943
ARIMA(1, 1, 0)x(1, 0, 1, 12)12 - AIC:2960.712709764296
ARIMA(1, 1, 0)x(1, 1, 0, 12)12 - AIC:2557.945907092698
ARIMA(1, 1, 0)x(1, 1, 1, 12)12 - AIC:2559.274166458508
ARIMA(1, 1, 1)x(0, 0, 0, 12)12 - AIC:3326.3285511700374
ARIMA(1, 1, 1)x(0, 0, 1, 12)12 - AIC:2985.868532151721
ARIMA(1, 1, 1)x(0, 1, 0, 12)12 - AIC:2790.7677149967103
ARIMA(1, 1, 1)x(0, 1, 1, 12)12 - AIC:2538.820635541546
ARIMA(1, 1, 1)x(1, 0, 0, 12)12 - AIC:2963.2789505804294
ARIMA(1, 1, 1)x(1, 0, 1, 12)12 - AIC:2941.2436984747465
ARIMA(1, 1, 1)x(1, 1, 0, 12)12 - AIC:2559.8258191422606
ARIMA(1, 1, 1)x(1, 1, 1, 12)12 - AIC:2539.712354465328

from https://www.digitalocean.com/community/tutorials/a-guide-to-time-series-forecasting-with-arima-in-python-3

来自https://www.digitalocean.com/community/tutorials/a-guide-to-time-series-forecasting-with-arima-in-python-3

also see https://github.com/decisionstats/pythonfordatascience/blob/master/time%2Bseries%20(1).ipynb

另见https://github.com/decisionstats/pythonfordatascience/blob/master/time%2Bseries%20(1).ipynb

回答by Satyajit Pattnaik

def evaluate_arima_model(X, arima_order):
    # prepare training dataset
    train_size = int(len(X) * 0.90)
    train, test = X[0:train_size], X[train_size:]
    history = [x for x in train]
    # make predictions
    predictions = list()
    for t in range(len(test)):
        model = ARIMA(history, order=arima_order)
        model_fit = model.fit(disp=0)
        yhat = model_fit.forecast()[0]
        predictions.append(yhat)
        history.append(test[t])
    # calculate out of sample error
    error = mean_squared_error(test, predictions)
    return error

# evaluate combinations of p, d and q values for an ARIMA model
def evaluate_models(dataset, p_values, d_values, q_values):
    dataset = dataset.astype('float32')
    best_score, best_cfg = float("inf"), None
    for p in p_values:
        for d in d_values:
            for q in q_values:
                order = (p,d,q)
                try:
                    mse = evaluate_arima_model(dataset, order)
                    if mse < best_score:
                        best_score, best_cfg = mse, order
                    print('ARIMA%s MSE=%.3f' % (order,mse))
                except:
                    continue
    print('Best ARIMA%s MSE=%.3f' % (best_cfg, best_score))

# load dataset
def parser(x):
    return datetime.strptime('190'+x, '%Y-%m')



import datetime
p_values = [4,5,6,7,8]
d_values = [0,1,2]
q_values = [2,3,4,5,6]
warnings.filterwarnings("ignore")
evaluate_models(train, p_values, d_values, q_values)

This will give you the p,d,q values, then use the values for your ARIMA model

这将为您提供 p、d、q 值,然后将这些值用于您的 ARIMA 模型

回答by Abhishek

As of now, we can directly use pyramid-arima package from pypi

截至目前,我们可以直接使用pypi中的 pyramid-arima 包

Check https://pypi.org/project/pyramid-arima/

检查 https://pypi.org/project/pyramid-arima/

回答by Boodhayana

In conda, use conda install -c saravji pmdarimato install.

在 conda 中,使用conda install -c saravji pmdarima来安装。

The user saravjihas put it in anaconda cloud.

用户saravji已将其放入 anaconda 云中。

then to use,

然后使用,

from pmdarima.arima import auto_arima

(Note that the name pyramid-arimais changed to pmdarima).

(请注意,名称pyramid-arima已更改为pmdarima)。