pandas 如何按一天中的时间对熊猫时间序列进行子集
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 
原文地址: http://stackoverflow.com/questions/21625850/
Warning: these are provided under cc-by-sa 4.0 license.  You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to subset pandas time series by time of day
提问by Rahul Savani
I am trying to subset a pandas time series that spans multiple days by time of day. E.g., I only want times between 12:00 and 13:00.
我正在尝试对跨越多天的Pandas时间序列进行子集化。例如,我只想要 12:00 到 13:00 之间的时间。
I know how to do this for a specific date, e.g.,
我知道如何在特定日期执行此操作,例如,
In [44]: type(test)
Out[44]: pandas.core.frame.DataFrame
In [23]: test
Out[23]:
                           col1
timestamp
2012-01-14 11:59:56+00:00     3
2012-01-14 11:59:57+00:00     3
2012-01-14 11:59:58+00:00     3
2012-01-14 11:59:59+00:00     3
2012-01-14 12:00:00+00:00     3
2012-01-14 12:00:01+00:00     3
2012-01-14 12:00:02+00:00     3
In [30]: test['2012-01-14 12:00:00' : '2012-01-14 13:00']
Out[30]:
                           col1
timestamp 
2012-01-14 12:00:00+00:00     3
2012-01-14 12:00:01+00:00     3
2012-01-14 12:00:02+00:00     3
But I have failed to do it for any date using test.index.houror test.index.indexer_between_time()which were both suggested as answers to similar questions. I tried the following:
但是我没有在任何日期使用test.index.hour或test.index.indexer_between_time()都被建议作为类似问题的答案。我尝试了以下方法:
In [44]: type(test)
Out[44]: pandas.core.frame.DataFrame
In [34]: test[(test.index.hour >= 12) & (test.index.hour < 13)]
Out[34]:
Empty DataFrame
Columns: [col1]
Index: []
In [36]: import datetime as dt
In [37]: test.index.indexer_between_time(dt.time(12),dt.time(13))
Out[37]: array([], dtype=int64)
For the first approach, I have no idea what test.index.houror test.index.minuteare actually returning:
对于第一种方法,我不知道什么test.index.hour或test.index.minute实际上正在返回:
In [41]: test.index
Out[41]:
<class 'pandas.tseries.index.DatetimeIndex'>
[2012-01-14 11:59:56, ..., 2012-01-14 12:00:02]
Length: 7, Freq: None, Timezone: tzlocal()
In [42]: test.index.hour
Out[42]: array([11, 23,  0,  0,  0,  0,  0], dtype=int32)
In [43]: test.index.minute
Out[43]: array([59, 50,  0,  0, 50, 50,  0], dtype=int32)
What are they returning? How can I do the desired subsetting? Ideally, how can I get both the two approaches above to work?
他们返回什么?我怎样才能做所需的子集?理想情况下,我怎样才能让上述两种方法都起作用?
Edit: The problem turned out to be the the index was invalid, which is evidenced by Timezone: tzlocal()above, as tzlocal()should not be allowed as timezone. When I changed my method of generating the index to pd.to_datetime(), according to the final part of the accepted answer, everything worked as expected.
编辑:问题原来是索引无效,Timezone: tzlocal()上面证明了这一点,因为tzlocal()不应该被允许作为时区。当我将生成索引的方法更改为 时pd.to_datetime(),根据已接受答案的最后部分,一切都按预期进行。
回答by David Hagan
Assuming the index is a valid pandas timestamp, the following will work:
假设索引是有效的Pandas时间戳,以下将起作用:
test.index.hourreturns an array containing the hours for each row in your dataframe. Ex: 
test.index.hour返回一个数组,其中包含数据框中每一行的小时数。前任:
df = pd.DataFrame(randn(100000,1),columns=['A'],index=pd.date_range('20130101',periods=100000,freq='T'))
df.index.yearreturns array([2013, 2013, 2013, ..., 2013, 2013, 2013])
df.index.year回报 array([2013, 2013, 2013, ..., 2013, 2013, 2013])
To grab all rows where the time is between 12 and 1, use
要获取时间介于 12 和 1 之间的所有行,请使用
df.between_time('12:00','13:00')
This will grab that timeframe over several days/years etc. If the index is not a valid timestamp, convert it to a valid timestamp using pd.to_datetime()
这将在几天/几年等内获取该时间范围。如果索引不是有效的时间戳,请使用将其转换为有效的时间戳 pd.to_datetime()

