pandas 使用 ARMA 的统计模型
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/15515019/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Statsmodel using ARMA
提问by user2189221
A bit new here but trying to get a statsmodel ARMA prediction tool to work. I've imported some stock data from Yahoo and gotten the ARMA to give me fitting parameters. However when I use the predict code all I receive is a list of errors that I don't seem to be able to figure out. Not quite sure what I'm doing wrong here:
这里有点新,但试图让 statsmodel ARMA 预测工具工作。我从雅虎导入了一些股票数据,并获得了 ARMA 来为我提供拟合参数。但是,当我使用预测代码时,我收到的只是一个我似乎无法弄清楚的错误列表。不太确定我在这里做错了什么:
import pandas
import statsmodels.tsa.api as tsa
from pandas.io.data import DataReader
start = pandas.datetime(2013,1,1)
end = pandas.datetime.today()
data = DataReader('GOOG','yahoo')
arma =tsa.ARMA(data['Close'], order =(2,2))
results= arma.fit()
results.predict(start=start,end=end)
The errors are:
错误是:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
C:\Windows\system32\<ipython-input-84-25a9b6bc631d> in <module>()
13 results= arma.fit()
14 results.summary()
---> 15 results.predict(start=start,end=end)
D:\Python27\lib\site-packages\statsmodels-0.5.0-py2.7.egg\statsmodels\base\wrapp
er.pyc in wrapper(self, *args, **kwargs)
88 results = object.__getattribute__(self, '_results')
89 data = results.model.data
---> 90 return data.wrap_output(func(results, *args, **kwargs), how)
91
92 argspec = inspect.getargspec(func)
D:\Python27\lib\site-packages\statsmodels-0.5.0-py2.7.egg\statsmodels\tsa\arima_
model.pyc in predict(self, start, end, exog, dynamic)
1265
1266 """
-> 1267 return self.model.predict(self.params, start, end, exog, dynamic
)
1268
1269 def forecast(self, steps=1, exog=None, alpha=.05):
D:\Python27\lib\site-packages\statsmodels-0.5.0-py2.7.egg\statsmodels\tsa\arima_
model.pyc in predict(self, params, start, end, exog, dynamic)
497
498 # will return an index of a date
--> 499 start = self._get_predict_start(start, dynamic)
500 end, out_of_sample = self._get_predict_end(end, dynamic)
501 if out_of_sample and (exog is None and self.k_exog > 0):
D:\Python27\lib\site-packages\statsmodels-0.5.0-py2.7.egg\statsmodels\tsa\arima_
model.pyc in _get_predict_start(self, start, dynamic)
404 #elif 'mle' not in method or dynamic: # should be on a date
405 start = _validate(start, k_ar, k_diff, self.data.dates,
--> 406 method)
407 start = super(ARMA, self)._get_predict_start(start)
408 _check_arima_start(start, k_ar, k_diff, method, dynamic)
D:\Python27\lib\site-packages\statsmodels-0.5.0-py2.7.egg\statsmodels\tsa\arima_
model.pyc in _validate(start, k_ar, k_diff, dates, method)
160 if isinstance(start, (basestring, datetime)):
161 start_date = start
--> 162 start = _index_date(start, dates)
163 start -= k_diff
164 if 'mle' not in method and start < k_ar - k_diff:
D:\Python27\lib\site-packages\statsmodels-0.5.0-py2.7.egg\statsmodels\tsa\base\d
atetools.pyc in _index_date(date, dates)
37 freq = _infer_freq(dates)
38 # we can start prediction at the end of endog
---> 39 if _idx_from_dates(dates[-1], date, freq) == 1:
40 return len(dates)
41
D:\Python27\lib\site-packages\statsmodels-0.5.0-py2.7.egg\statsmodels\tsa\base\d
atetools.pyc in _idx_from_dates(d1, d2, freq)
70 from pandas import DatetimeIndex
71 return len(DatetimeIndex(start=d1, end=d2,
---> 72 freq = _freq_to_pandas[freq])) - 1
73 except ImportError, err:
74 from pandas import DateRange
D:\Python27\lib\site-packages\statsmodels-0.5.0-py2.7.egg\statsmodels\tsa\base\d
atetools.pyc in __getitem__(self, key)
11 # being lazy, don't want to replace dictionary below
12 def __getitem__(self, key):
---> 13 return get_offset(key)
14 _freq_to_pandas = _freq_to_pandas_class()
15 except ImportError, err:
D:\Python27\lib\site-packages\pandas\tseries\frequencies.pyc in get_offset(name)
484 """
485 if name not in _dont_uppercase:
--> 486 name = name.upper()
487
488 if name in _rule_aliases:
AttributeError: 'NoneType' object has no attribute 'upper'
回答by jseabold
Looks like a bug to me. I'll look into it.
对我来说看起来像一个错误。我会调查一下。
https://github.com/statsmodels/statsmodels/issues/712
https://github.com/statsmodels/statsmodels/issues/712
Edit: As a workaround, you can just drop the DatetimeIndex from the DataFrame and pass it the numpy array. It makes prediction a little trickier date-wise, but it's already pretty tricky to use dates for prediction when there is no frequency, so just having the starting and ending dates is essentially meaningless.
编辑:作为一种解决方法,您可以从 DataFrame 中删除 DatetimeIndex 并将其传递给 numpy 数组。它使日期方面的预测变得有点棘手,但是在没有频率的情况下使用日期进行预测已经非常棘手,因此仅具有开始日期和结束日期基本上没有意义。
import pandas
import statsmodels.tsa.api as tsa
from pandas.io.data import DataReader
import pandas
data = DataReader('GOOG','yahoo')
dates = data.index
# start at a date on the index
start = dates.get_loc(pandas.datetools.parse("1-2-2013"))
end = start + 30 # "steps"
# NOTE THE .values
arma =tsa.ARMA(data['Close'].values, order =(2,2))
results= arma.fit()
results.predict(start, end)
回答by Troy D
When I run your code, I get:
当我运行你的代码时,我得到:
"ValueError: There is no frequency for these dates and date 2013-01-01 00:00:00 is not in dates index. Try giving a date that is in the dates index or use an integer"
“ValueError:这些日期没有频率,日期 2013-01-01 00:00:00 不在日期索引中。尝试给出日期索引中的日期或使用整数”
Since trading dates are happen at uneven frequency (holidays and weekends), the model is not smart enough to know the correct frequency for calculations.
由于交易日期的发生频率不均匀(节假日和周末),该模型不够智能,无法知道计算的正确频率。
If you replace the dates with their integer location in the index, then you get your predictions. Then you can simply put the original index back on the results.
如果你用它们在索引中的整数位置替换日期,那么你就会得到你的预测。然后您可以简单地将原始索引放回结果中。
prediction = results.predict(start=0, end=len(data) - 1)
prediction.index = data.index
print(prediction)
2010-01-04 689.507451
2010-01-05 627.085986
2010-01-06 624.256331
2010-01-07 608.133481
...
2017-05-09 933.700555
2017-05-10 931.290023
2017-05-11 927.781427
2017-05-12 929.661014
As an aside, you may want to run a model like this on the daily returns rather than on the raw prices. Running it on the raw prices isn't going to capture momentum and mean reversion like you probably think it would. Your model is being built off the absolute values of the prices, not on the change in prices, momentum, moving average, etc. other factors you probably want to be using. The predictions you're creating will look pretty good because they're only predicting one step ahead, so it doesn't capture the compounding error. This confuses a lot of people. The errors will look small relative to the absolute value of the stock price, but the model won't be very predictive.
顺便说一句,您可能希望在每日回报而不是原始价格上运行这样的模型。以原始价格运行它不会像您可能认为的那样捕捉动力和均值回归。您的模型是根据价格的绝对值建立的,而不是根据价格的变化、动量、移动平均线等您可能想要使用的其他因素。您创建的预测看起来非常好,因为它们只提前一步预测,因此它不会捕获复合误差。这让很多人感到困惑。相对于股票价格的绝对值,误差看起来很小,但该模型的预测性不强。
I'd suggest reading through this walkthrough for a starter:
我建议先阅读本演练:
http://www.johnwittenauer.net/a-simple-time-series-analysis-of-the-sp-500-index/
http://www.johnwittenauer.net/a-simple-time-series-analysis-of-the-sp-500-index/

