pandas 检查索引中是否有任何缺失的日期

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/52044348/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 05:58:53  来源:igfitidea点击:

check for any missing dates in the index

pythonpandas

提问by user_6396

Is there any way to check for missing dates in a dataframe directly. I want to check if there are a missing dates between 2013-01-19to 2018-01-29

有什么方法可以直接检查数据框中缺少的日期。我想检查是否有之间缺少日期2013-01-192018-01-29

            GWA_BTC      GWA_ETH    GWA_LTC  GWA_XLM  GWA_XRP
   Date                 
2013-01-19  11,826.36   1,068.45    195.00    0.51    1.82
2013-01-20  13,062.68   1,158.71    207.58    0.52    1.75
   ...
2018-01-28  12,326.23   1,108.90    197.36    0.48    1.55
2018-01-29  11,397.52   1,038.21    184.92    0.47    1.43

I tried to check it manually but it took a lot of time.

我试图手动检查它,但花了很多时间。

回答by Vaishali

You can use DatetimeIndex.difference(other)

您可以使用DatetimeIndex.difference(other)

pd.date_range(start = '2013-01-19', end = '2018-01-29' ).difference(df.index)

It returns the elements not present in the other

它返回另一个不存在的元素

回答by sacuL

Example:

例子:

As a minimal example, take this:

作为一个最小的例子,请看这个:

>>> df
              GWA_BTC   GWA_ETH  GWA_LTC  GWA_XLM  GWA_XRP
Date                                                      
2013-01-19  11,826.36  1,068.45   195.00     0.51     1.82
2013-01-20  13,062.68  1,158.71   207.58     0.52     1.75
2013-01-28  12,326.23  1,108.90   197.36     0.48     1.55
2013-01-29  11,397.52  1,038.21   184.92     0.47     1.43

And we can find the missing dates between 2013-01-19and 2013-01-29

我们可以找到2013-01-19和之间的缺失日期2013-01-29

Method 1:

方法一:

See @Vaishali's answer

见@Vaishali 的回答

Use .differenceto find the difference between your datetime index and the set of all dates within your range:

使用.difference找到你的日期时间指数和设置您的范围内的所有日期之间的区别:

pd.date_range('2013-01-19', '2013-01-29').difference(df.index)

Which returns:

DatetimeIndex(['2013-01-21', '2013-01-22', '2013-01-23', '2013-01-24',
               '2013-01-25', '2013-01-26', '2013-01-27'],
              dtype='datetime64[ns]', freq=None)
pd.date_range('2013-01-19', '2013-01-29').difference(df.index)

返回:

DatetimeIndex(['2013-01-21', '2013-01-22', '2013-01-23', '2013-01-24',
               '2013-01-25', '2013-01-26', '2013-01-27'],
              dtype='datetime64[ns]', freq=None)

Method 2:

方法二:

You can re-index your dataframe using all dates within your desired daterange, and find where reindexhas inserted NaNs.

您可以使用所需日期范围内的所有日期重新索引数据框,并找到reindex插入NaNs 的位置。

And to find missing dates between 2013-01-19and 2013-01-29:

并在2013-01-19和之间查找缺失的日期2013-01-29

>>> df.reindex(pd.date_range('2013-01-19', '2013-01-29')).isnull().all(1)

2013-01-19    False
2013-01-20    False
2013-01-21     True
2013-01-22     True
2013-01-23     True
2013-01-24     True
2013-01-25     True
2013-01-26     True
2013-01-27     True
2013-01-28    False
2013-01-29    False
Freq: D, dtype: bool

Those values with Trueare the missing dates in your original dataframe

这些值True是原始数据框中缺少的日期

回答by Yuca

assuming data is daily non business dates:

假设数据是每日非营业日期:

df.index.to_series().diff().dt.days > 1

回答by Vaibhav Sharma

I can't post a comment but you can probably traverse each value and add 24 hours to the previous value to see if the date matches?

我无法发表评论,但您可以遍历每个值并将 24 小时添加到前一个值以查看日期是否匹配?

import pandas as pd

a = [1,2,3,4,5]
b = [1,0.4,0.3,0.5,0.2]

df = pd.DataFrame({'a':a , 'b': b})

for i in range(len(df)):
    prev = df.loc[i,'a']
    if i is 0:
        continue
    else:
         # Add 1 day to the current value and check with prev value