基于索引的 Pandas Dataframe Mask
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/17559885/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas Dataframe Mask based on index
提问by BrandonAGr
I have the following dataframe:
我有以下数据框:
import pandas as pd
index = pd.date_range('2013-1-1',periods=10,freq='15Min')
data = pd.DataFrame(data=[1,2,3,4,5,6,7,8,9,0], columns=['value'], index=index)
How can I generate a mask based on the index value? I know I can do something like:
如何根据索引值生成掩码?我知道我可以做这样的事情:
data['value'] > 3
Out[40]:
2013-01-01 00:00:00 False
2013-01-01 00:15:00 False
2013-01-01 00:30:00 False
2013-01-01 00:45:00 True
2013-01-01 01:00:00 True
2013-01-01 01:15:00 True
2013-01-01 01:30:00 True
2013-01-01 01:45:00 True
2013-01-01 02:00:00 True
2013-01-01 02:15:00 False
Freq: 15T, Name: value, dtype: bool
I want to generate a mask to only consider some rows where the index is in a certain range. I was thinking of doing something like data['index'].time() > datetime.time(1,15)to generate a mask. Except of course data['index']fails because index is not the name of a column. How can you reference the index value for a row in a mask?
我想生成一个掩码,只考虑索引在某个范围内的一些行。我正在考虑做一些类似data['index'].time() > datetime.time(1,15)生成面具的事情。除了当然data['index']失败,因为索引不是列的名称。如何引用掩码中一行的索引值?
回答by Andy Hayden
You can mask using indexer_between_time:
您可以使用indexer_between_time以下方法进行屏蔽:
In [11]: data.index.indexer_between_time(start='01:15', end='02:00')
Out[11]: array([5, 6, 7, 8])
In [12]: data.iloc[data.index.indexer_between_time(start='1:15', end='02:00')]
Out[12]:
value
2013-01-01 01:15:00 6
2013-01-01 01:30:00 7
2013-01-01 01:45:00 8
2013-01-01 02:00:00 9
As you can see, you access the index by the attribute .index.
如您所见,您可以通过属性访问索引.index。
Note: indexer_between_timeby default both include_startand include_endare True, it also offers a tzargument to compare the time to a different timezone.
注意:indexer_between_time默认情况下,include_start和include_end都为 True,它还提供了一个tz参数来将时间与不同的时区进行比较。
回答by John Saraceno
'start' and 'stop' keywords are deprecated.With pandas >17.1; I had to use the following syntax instead:
不推荐使用“开始”和“停止”关键字。Pandas >17.1;我不得不使用以下语法:
data.iloc[data.index.indexer_between_time('1:15', '02:00')]
Out[90]:
value
2013-01-01 01:15:00 6
2013-01-01 01:30:00 7
2013-01-01 01:45:00 8
2013-01-01 02:00:00 9

