pandas 检查索引中是否有任何缺失的日期
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/52044348/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
check for any missing dates in the index
提问by user_6396
Is there any way to check for missing dates in a dataframe directly.
I want to check if there are a missing dates between 2013-01-19
to 2018-01-29
有什么方法可以直接检查数据框中缺少的日期。我想检查是否有之间缺少日期2013-01-19
至2018-01-29
GWA_BTC GWA_ETH GWA_LTC GWA_XLM GWA_XRP
Date
2013-01-19 11,826.36 1,068.45 195.00 0.51 1.82
2013-01-20 13,062.68 1,158.71 207.58 0.52 1.75
...
2018-01-28 12,326.23 1,108.90 197.36 0.48 1.55
2018-01-29 11,397.52 1,038.21 184.92 0.47 1.43
I tried to check it manually but it took a lot of time.
我试图手动检查它,但花了很多时间。
回答by Vaishali
You can use DatetimeIndex.difference(other)
您可以使用DatetimeIndex.difference(other)
pd.date_range(start = '2013-01-19', end = '2018-01-29' ).difference(df.index)
It returns the elements not present in the other
它返回另一个不存在的元素
回答by sacuL
Example:
例子:
As a minimal example, take this:
作为一个最小的例子,请看这个:
>>> df
GWA_BTC GWA_ETH GWA_LTC GWA_XLM GWA_XRP
Date
2013-01-19 11,826.36 1,068.45 195.00 0.51 1.82
2013-01-20 13,062.68 1,158.71 207.58 0.52 1.75
2013-01-28 12,326.23 1,108.90 197.36 0.48 1.55
2013-01-29 11,397.52 1,038.21 184.92 0.47 1.43
And we can find the missing dates between 2013-01-19
and 2013-01-29
我们可以找到2013-01-19
和之间的缺失日期2013-01-29
Method 1:
方法一:
See @Vaishali's answer
见@Vaishali 的回答
Use .difference
to find the difference between your datetime index and the set of all dates within your range:
使用.difference
找到你的日期时间指数和设置您的范围内的所有日期之间的区别:
pd.date_range('2013-01-19', '2013-01-29').difference(df.index)
Which returns:
DatetimeIndex(['2013-01-21', '2013-01-22', '2013-01-23', '2013-01-24',
'2013-01-25', '2013-01-26', '2013-01-27'],
dtype='datetime64[ns]', freq=None)
pd.date_range('2013-01-19', '2013-01-29').difference(df.index)
返回:
DatetimeIndex(['2013-01-21', '2013-01-22', '2013-01-23', '2013-01-24',
'2013-01-25', '2013-01-26', '2013-01-27'],
dtype='datetime64[ns]', freq=None)
Method 2:
方法二:
You can re-index your dataframe using all dates within your desired daterange, and find where reindex
has inserted NaN
s.
您可以使用所需日期范围内的所有日期重新索引数据框,并找到reindex
插入NaN
s 的位置。
And to find missing dates between 2013-01-19
and 2013-01-29
:
并在2013-01-19
和之间查找缺失的日期2013-01-29
:
>>> df.reindex(pd.date_range('2013-01-19', '2013-01-29')).isnull().all(1)
2013-01-19 False
2013-01-20 False
2013-01-21 True
2013-01-22 True
2013-01-23 True
2013-01-24 True
2013-01-25 True
2013-01-26 True
2013-01-27 True
2013-01-28 False
2013-01-29 False
Freq: D, dtype: bool
Those values with True
are the missing dates in your original dataframe
这些值True
是原始数据框中缺少的日期
回答by Yuca
assuming data is daily non business dates:
假设数据是每日非营业日期:
df.index.to_series().diff().dt.days > 1
回答by Vaibhav Sharma
I can't post a comment but you can probably traverse each value and add 24 hours to the previous value to see if the date matches?
我无法发表评论,但您可以遍历每个值并将 24 小时添加到前一个值以查看日期是否匹配?
import pandas as pd
a = [1,2,3,4,5]
b = [1,0.4,0.3,0.5,0.2]
df = pd.DataFrame({'a':a , 'b': b})
for i in range(len(df)):
prev = df.loc[i,'a']
if i is 0:
continue
else:
# Add 1 day to the current value and check with prev value