pandas Python:这是在熊猫数据框中查找索引的快速方法?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/35108199/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python: which is a fast way to find index in pandas dataframe?
提问by emax
I have a dataframe like the following
我有一个如下所示的数据框
df =
a ID1 ID2 Proximity
0 0 900000498 NaN 0.000000
1 1 900000498 900004585 3.900000
2 2 900000498 900005562 3.900000
3 3 900000498 900008613 0.000000
4 4 900000498 900012333 0.000000
5 5 900000498 900019524 3.900000
6 6 900000498 900019877 0.000000
7 7 900000498 900020141 3.900000
8 8 900000498 900022133 3.900000
9 9 900000498 900022919 0.000000
I want to find for a given couple ID1-ID2
the corresponding Proximity
value.
For instance given the input [900000498, 900022133]
I want as output 3.900000
我想为给定的夫妇找到ID1-ID2
相应的Proximity
值。例如,给定[900000498, 900022133]
我想要的输入作为输出3.900000
回答by EdChum
If this is a common operation then I'd set the index to those columns and then you can perform the index lookup using loc
and pass a tuple of the col values:
如果这是一个常见操作,那么我loc
会将索引设置为这些列,然后您可以使用并传递 col 值的元组来执行索引查找:
In [60]:
df1 = df.set_index(['ID1','ID2'])
In [61]:
%timeit df1.loc[(900000498,900022133), 'Proximity']
%timeit df.loc[(df['ID1']==900000498)&(df['ID2']==900022133), 'Proximity']
1000 loops, best of 3: 565 μs per loop
100 loops, best of 3: 1.69 ms per loop
You can see that once the cols form the index then lookup is 3x faster than a filter operation.
您可以看到,一旦 cols 形成索引,则查找比过滤操作快 3 倍。
The output is pretty much the same:
输出几乎相同:
In [63]:
print(df1.loc[(900000498,900022133), 'Proximity'])
print(df.loc[(df['ID1']==900000498)&(df['ID2']==900022133), 'Proximity'])
3.9
8 3.9
Name: Proximity, dtype: float64