pandas 尽管使用频率重新索引,但 ARIMA 模型的“无法在没有频率的情况下向时间戳添加整数值”错误

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/45051018/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 03:58:45  来源:igfitidea点击:

'Cannot add integral value to Timestamp without freq' error for ARIMA model although re-indexed with frequency

pythonpandasstatsmodelsarima

提问by E. Aly

I'm trying to do a time series prediction using an ARIMA model on this series:

我正在尝试使用该系列的 ARIMA 模型进行时间序列预测:

1960-01-01    12.7
1961-01-01    12.1
1962-01-01    12.7
1963-01-01    12.8
1964-01-01    12.3
1965-01-01    13.0
1966-01-01    12.5
1967-01-01    12.9
1968-01-01    12.9
1969-01-01    13.3
1970-01-01    13.2
1971-01-01    13.0
1972-01-01    12.6
1973-01-01    12.2
1974-01-01    12.4
1975-01-01    12.7
1976-01-01    12.6
1977-01-01    12.2
1978-01-01    12.5
1979-01-01    12.2
1980-01-01    12.2
1981-01-01    12.2
1982-01-01    12.1
1983-01-01    12.3
1984-01-01    11.7
1985-01-01    11.8
1986-01-01    11.5
1987-01-01    11.2
1988-01-01    11.0
1989-01-01    10.9
1990-01-01    10.8
1991-01-01    10.8
1992-01-01    10.6
1993-01-01    10.4
1994-01-01    10.2
1995-01-01    10.2
1996-01-01    10.2
1997-01-01    10.0
1998-01-01     9.8
1999-01-01     9.8
2000-01-01     9.6
2001-01-01     9.3
2002-01-01     9.4
2003-01-01     9.5
2004-01-01     9.1
2005-01-01     9.1
2006-01-01     9.0
2007-01-01     9.0
2008-01-01     9.0
2009-01-01     9.3
2010-01-01     9.2
2011-01-01     9.1
2012-01-01     9.4
2013-01-01     9.4
2014-01-01     9.2
2015-01-01     9.6
Name: Death rate, crude (per 1,000 people), dtype: float64

I use the following code to generate different (p, d, q) values then try each value and get the corresponding AIC, then choose the one that is related to the least AIC. Then use this (p, d, q) values in prediction.

我使用以下代码生成不同的 (p, d, q) 值,然后尝试每个值并获得相应的 AIC,然后选择与最少 AIC 相关的值。然后在预测中使用这个 (p, d, q) 值。

import datetime
import warnings
import itertools
from sklearn.metrics import mean_squared_error as mse

def MAPE (A, F):
    import numpy as np
    n = len(A)
    Av = np.array(A.values)
    Fv = np.array(F.values)
    mape = np.mean(np.abs((Av-Fv)/Av))*100
    mape = np.around(mape, decimals= 2)
    return mape

# Generate pdq combinations
p= d= q= range(7)
pdq = list(itertools.product(p, d, q))

# Choose min pdq corresponding to min AIC
warnings.filterwarnings('ignore')
param_aic = {}
for param in pdq:
    try:
        mod = sm.tsa.ARIMA(cmortS, order= param)
        result = mod.fit()
        param_aic[param] = result.aic
    except:
        continue

min_aic = min(param_aic.values())
min_param = ()
for pm, aic in param_aic.items():
    if aic == min_aic:
        min_param = pm

# Run the model with min pdq
model = sm.tsa.ARIMA(cmortS, order= min_param)
results = model.fit()

#Forecast validation
tp = ''
if min_param[1] > 0:
    tp = 'levels'
else:
    tp = 'linear'

train_sz = int(len(cmortS)*0.66)
train = cmortS[:train_sz]
tst = cmortS[train_sz:]
pred_strt = tst.index[0]
tst_pred = results.predict(start= pred_strt, typ= tp)
mserror = mse(tst, tst_pred)
mserror = np.round(mserror, decimals= 5)
mp = MAPE(tst, tst_pred)
print('Model order: {}, MAPE: {}%, mse: {}'.format(min_param, mp, mserror)) 

# Prediction
end_yr = '2050'
end_dt = pd.to_datetime(end_yr, format= '%Y')
strt_dt = pd.to_datetime('2014', format= '%Y')
Var_pred = results.predict(start= strt_dt, end= end_dt, typ = tp)

Var_pred

and I get the following error when I run it:

运行时出现以下错误:

ValueError: Cannot add integral value to Timestamp without freq.

Although I reindexed the series with a date range with freq= 'AS', I still get the same error.

尽管我使用 freq= 'AS' 为日期范围重新索引了该系列,但我仍然遇到相同的错误。

How can I solve that?

我该如何解决?

回答by Halee

Changing the final few lines of your code to this format should resolve the error message:

将代码的最后几行更改为此格式应该可以解决错误消息:

# Prediction
strt_date = pd.to_datetime('2014-01-01 01:00:00')
end_date = pd.to_datetime('2050-01-01 01:00:00')
Var_pred = results.predict(start = strt_date, end = end_date, typ = tp) 
Var_pred