Pandas DataFrame 如何查询最近的日期时间索引?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/42264848/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 02:59:48  来源:igfitidea点击:

Pandas DataFrame How to query the closest datetime index?

pandasdataframe

提问by Bryan Fok

How do i query for the closest index from a Pandas DataFrame? The index is DatetimeIndex

如何从 Pandas DataFrame 查询最近的索引?索引是 DatetimeIndex

2016-11-13 20:00:10.617989120   7.0 132.0
2016-11-13 22:00:00.022737152   1.0 128.0
2016-11-13 22:00:28.417561344   1.0 132.0

I tried this:

我试过这个:

df.index.get_loc(df.index[0], method='nearest')

but it give me InvalidIndexError: Reindexing only valid with uniquely valued Index objects

但它给了我 InvalidIndexError: Reindexing only valid with uniquely valued Index objects

Same error if I tried this:

如果我试过这个,同样的错误:

dt =datetime.datetime.strptime("2016-11-13 22:01:25", "%Y-%m-%d %H:%M:%S")
df.index.get_loc(dt, method='nearest')

But if I remove method='nearest'it works, but that is not I want, I want to find the closest index from my query datetime

但是如果我删除method='nearest'它有效,但这不是我想要的,我想从我的查询日期时间中找到最接近的索引

回答by jezrael

It seems you need first get position by get_locand then select by []:

看来你需要先得到位置get_loc,然后选择[]

dt = pd.to_datetime("2016-11-13 22:01:25.450")
print (dt)
2016-11-13 22:01:25.450000

print (df.index.get_loc(dt, method='nearest'))
2

idx = df.index[df.index.get_loc(dt, method='nearest')]
print (idx)
2016-11-13 22:00:28.417561344
#if need select row to Series use iloc
s = df.iloc[df.index.get_loc(dt, method='nearest')]
print (s)
b      1.0
c    132.0
Name: 2016-11-13 22:00:28.417561344, dtype: float64

回答by Bryan Fok

I believe jezrael solution works, but not on my dataframe (which i have no clue why). This is the solution that I came up with.

我相信 jezrael 解决方案有效,但不适用于我的数据框(我不知道为什么)。这是我想出的解决方案。

from bisect import bisect #operate as sorted container
timestamps = np.array(df.index)
upper_index = bisect(timestamps, np_dt64, hi=len(timestamps)-1) #find the upper index of the closest time stamp
df_index = df.index.get_loc(min(timestamps[upper_index], timestamps[upper_index-1],key=lambda x: abs(x - np_dt64))) #find the closest between upper and lower timestamp