pandas.DatetimeIndex 频率为 None 且无法设置

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/46217529/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 04:26:56  来源:igfitidea点击:

pandas.DatetimeIndex frequency is None and can't be set

pythonpandasindexingtime-series

提问by clstaudt

I created a DatetimeIndex from a "date" column:

我从“日期”列创建了一个 DatetimeIndex:

sales.index = pd.DatetimeIndex(sales["date"])

Now the index looks as follows:

现在索引如下所示:

DatetimeIndex(['2003-01-02', '2003-01-03', '2003-01-04', '2003-01-06',
                   '2003-01-07', '2003-01-08', '2003-01-09', '2003-01-10',
                   '2003-01-11', '2003-01-13',
                   ...
                   '2016-07-22', '2016-07-23', '2016-07-24', '2016-07-25',
                   '2016-07-26', '2016-07-27', '2016-07-28', '2016-07-29',
                   '2016-07-30', '2016-07-31'],
                  dtype='datetime64[ns]', name='date', length=4393, freq=None)

As you see, the freqattribute is None. I suspect that errors down the road are caused by the missing freq. However, if I try to set the frequency explicitly:

如您所见,该freq属性为 None。我怀疑后面的错误是由缺少freq. 但是,如果我尝试明确设置频率:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-148-30857144de81> in <module>()
      1 #### DEBUG
----> 2 sales_train = disentangle(df_train)
      3 sales_holdout = disentangle(df_holdout)
      4 result = sarima_fit_predict(sales_train.loc[5002, 9990]["amount_sold"], sales_holdout.loc[5002, 9990]["amount_sold"])

<ipython-input-147-08b4c4ecdea3> in disentangle(df_train)
      2     # transform sales table to disentangle sales time series
      3     sales = df_train[["date", "store_id", "article_id", "amount_sold"]]
----> 4     sales.index = pd.DatetimeIndex(sales["date"], freq="d")
      5     sales = sales.pivot_table(index=["store_id", "article_id", "date"])
      6     return sales

/usr/local/lib/python3.6/site-packages/pandas/util/_decorators.py in wrapper(*args, **kwargs)
     89                 else:
     90                     kwargs[new_arg_name] = new_arg_value
---> 91             return func(*args, **kwargs)
     92         return wrapper
     93     return _deprecate_kwarg

/usr/local/lib/python3.6/site-packages/pandas/core/indexes/datetimes.py in __new__(cls, data, freq, start, end, periods, copy, name, tz, verify_integrity, normalize, closed, ambiguous, dtype, **kwargs)
    399                                          'dates does not conform to passed '
    400                                          'frequency {1}'
--> 401                                          .format(inferred, freq.freqstr))
    402 
    403         if freq_infer:

ValueError: Inferred frequency None from passed dates does not conform to passed frequency D

So apparently a frequency has been inferred, but is stored neither in the freqnor inferred_freqattribute of the DatetimeIndex - both are None. Can someone clear up the confusion?

因此很明显,频率已经推断出,但是既没有存储在freq也没有inferred_freq了DatetimeIndex的属性-无论是无。有人可以清除混乱吗?

采纳答案by Brad Solomon

You have a couple options here:

你有几个选择:

  • pd.infer_freq
  • pd.tseries.frequencies.to_offset
  • pd.infer_freq
  • pd.tseries.frequencies.to_offset

I suspect that errors down the road are caused by the missing freq.

我怀疑后面的错误是由缺少的频率引起的。

You are absolutely right. Here's what I use often:

你是绝对正确的。这是我经常使用的:

def add_freq(idx, freq=None):
    """Add a frequency attribute to idx, through inference or directly.

    Returns a copy.  If `freq` is None, it is inferred.
    """

    idx = idx.copy()
    if freq is None:
        if idx.freq is None:
            freq = pd.infer_freq(idx)
        else:
            return idx
    idx.freq = pd.tseries.frequencies.to_offset(freq)
    if idx.freq is None:
        raise AttributeError('no discernible frequency found to `idx`.  Specify'
                             ' a frequency string with `freq`.')
    return idx

An example:

一个例子:

idx=pd.to_datetime(['2003-01-02', '2003-01-03', '2003-01-06'])  # freq=None

print(add_freq(idx))  # inferred
DatetimeIndex(['2003-01-02', '2003-01-03', '2003-01-06'], dtype='datetime64[ns]', freq='B')

print(add_freq(idx, freq='D'))  # explicit
DatetimeIndex(['2003-01-02', '2003-01-03', '2003-01-06'], dtype='datetime64[ns]', freq='D')

Using asfreqwill actually reindex (fill) missing dates, so be careful of that if that's not what you're looking for.

使用asfreq实际上会重新索引(填充)缺失的日期,所以如果这不是你想要的,请小心。

The primary function for changing frequencies is the asfreqfunction. For a DatetimeIndex, this is basically just a thin, but convenient wrapper around reindexwhich generates a date_rangeand calls reindex.

改变频率的主要功能是asfreq函数。对于 a DatetimeIndex,这基本上只是一个薄而方便的包装器,reindex用于生成 adate_range和调用reindex

回答by JohnE

It seems to relate to missing dates as 3kt notes. You might be able to "fix" with asfreq('D')as EdChum suggests but that gives you a continuous index with missing data values. It works fine for some some sample data I made up:

它似乎与作为 3kt 音符的缺失日期有关。您可能可以asfreq('D')按照 EdChum 的建议进行“修复”,但这会为您提供一个缺少数据值的连续索引。它适用于我编写的一些示例数据:

df=pd.DataFrame({ 'x':[1,2,4] }, 
   index=pd.to_datetime(['2003-01-02', '2003-01-03', '2003-01-06']) )

df
Out[756]: 
            x
2003-01-02  1
2003-01-03  2
2003-01-06  4

df.index
Out[757]: DatetimeIndex(['2003-01-02', '2003-01-03', '2003-01-06'], 
          dtype='datetime64[ns]', freq=None)

Note that freq=None. If you apply asfreq('D'), this changes to freq='D':

请注意freq=None。如果您申请asfreq('D'),这将更改为freq='D'

df.asfreq('D')
Out[758]: 
              x
2003-01-02  1.0
2003-01-03  2.0
2003-01-04  NaN
2003-01-05  NaN
2003-01-06  4.0

df.asfreq('d').index
Out[759]: 
DatetimeIndex(['2003-01-02', '2003-01-03', '2003-01-04', '2003-01-05',
               '2003-01-06'],
              dtype='datetime64[ns]', freq='D')

More generally, and depending on what exactly you are trying to do, you might want to check out the following for other options like reindex & resample: Add missing dates to pandas dataframe

更一般地说,根据您究竟要做什么,您可能需要查看以下其他选项,如重新索引和重新采样: 将缺少的日期添加到Pandas数据框

回答by mrbTT

I'm not sure if earlier versions of python had this, but 3.6 has this simple solution:

我不确定早期版本的 python 是否有这个,但 3.6 有这个简单的解决方案:

# 'b' stands for business days
# 'w' for weekly, 'd' for daily, and you get the idea...
df.index.freq = 'b' 

回答by Riz.Khan

I am not sure but I was having the same error. I was not able to resolve my issue by suggestions posted above but solved it using the below solution.

我不确定,但我遇到了同样的错误。我无法通过上面发布的建议解决我的问题,但使用以下解决方案解决了它。

Pandas DatetimeIndex + seasonal_decompose = missing frequency.

Pandas DatetimeIndex + season_decompose = 缺失频率

Best Regards

此致