在 Pandas DataFrame/Series 中快速选择时间间隔
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 
原文地址: http://stackoverflow.com/questions/21512042/
Warning: these are provided under cc-by-sa 4.0 license.  You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Fast selection of a time interval in a pandas DataFrame/Series
提问by Mannaggia
my problem is that I want to filter a DataFrame to only include times within the interval [start, end). If do not care about the day, I would like to filter only for start and end time for each day. I have a solution for this but it is slow. So my question is if there is a faster way to do the time based filtering.
我的问题是我想过滤 DataFrame 以仅包含时间间隔[start, end)内的时间。如果不关心当天,我只想过滤每一天的开始和结束时间。我有一个解决方案,但它很慢。所以我的问题是是否有更快的方法来进行基于时间的过滤。
Example
例子
import pandas as pd
import time
index=pd.date_range(start='2012-11-05 01:00:00', end='2012-11-05 23:00:00', freq='1S').tz_localize('UTC')
df=pd.DataFrame(range(len(index)), index=index, columns=['Number'])
# select from 1 to 2 am, include day
now=time.time()
df2=df.ix['2012-11-05 01:00:00':'2012-11-05 02:00:00']
print 'Took %s seconds' %(time.time()-now) #0.0368609428406
# select from 1 to 2 am, for every day
now=time.time()
selector=(df.index.hour>=1) & (df.index.hour<2)
df3=df[selector]
print 'Took %s seconds' %(time.time()-now) #Took  0.0699911117554
As you can see if I remove the day (second case) it takes almost twice as much. The computation time increases rapidly if I have a number of different days, e.g from 5 to 7 Nov:
正如您所看到的,如果我删除这一天(第二种情况),它几乎需要两倍的时间。如果我有许多不同的日子,例如从 11 月 5 日到 7 日,计算时间会迅速增加:
index=pd.date_range(start='2012-11-05 01:00:00', end='2012-11-07 23:00:00', freq='1S').tz_localize('UTC')
So, to summarize is there a faster method to filter by time of the day, across many days?
那么,总而言之,是否有一种更快的方法可以跨多天按一天中的时间进行过滤?
Thx
谢谢
回答by Nipun Batra
You need between_timemethod.
你需要between_time方法。
In [14]: %timeit df.between_time(start_time='01:00', end_time='02:00')
100 loops, best of 3: 10.2 ms per loop
In [15]: %timeit selector=(df.index.hour>=1) & (df.index.hour<2); df[selector]
100 loops, best of 3: 18.2 ms per loop
I had done these tests with 5th to 7th November as index.
我以 11 月 5 日至 7 日为索引完成了这些测试。
Documentation
文档
Definition: df.between_time(self, start_time, end_time, include_start=True, include_end=True) Docstring: Select values between particular times of the day (e.g., 9:00-9:30 AM) Parameters ---------- start_time : datetime.time or string end_time : datetime.time or string include_start : boolean, default True include_end : boolean, default True Returns ------- values_between_time : type of caller

