Python 大熊猫 .at 与 .loc
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/37216485/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
pandas .at versus .loc
提问by piRSquared
I've been exploring how to optimize my code and ran across pandas
.at
method. Per the documentation
我一直在探索如何优化我的代码并跨越pandas
.at
方法。根据文档
Fast label-based scalar accessor
Similarly to loc, at provides label based scalar lookups. You can also set using these indexers.
基于标签的快速标量访问器
与 loc 类似,at 提供基于标签的标量查找。您还可以使用这些索引器进行设置。
So I ran some samples:
所以我跑了一些样本:
Setup
设置
import pandas as pd
import numpy as np
from string import letters, lowercase, uppercase
lt = list(letters)
lc = list(lowercase)
uc = list(uppercase)
def gdf(rows, cols, seed=None):
"""rows and cols are what you'd pass
to pd.MultiIndex.from_product()"""
gmi = pd.MultiIndex.from_product
df = pd.DataFrame(index=gmi(rows), columns=gmi(cols))
np.random.seed(seed)
df.iloc[:, :] = np.random.rand(*df.shape)
return df
seed = [3, 1415]
df = gdf([lc, uc], [lc, uc], seed)
print df.head().T.head().T
df
looks like:
df
好像:
a
A B C D E
a A 0.444939 0.407554 0.460148 0.465239 0.462691
B 0.032746 0.485650 0.503892 0.351520 0.061569
C 0.777350 0.047677 0.250667 0.602878 0.570528
D 0.927783 0.653868 0.381103 0.959544 0.033253
E 0.191985 0.304597 0.195106 0.370921 0.631576
Lets use .at
and .loc
and ensure I get the same thing
让我们使用.at
并.loc
确保我得到同样的东西
print "using .loc", df.loc[('a', 'A'), ('c', 'C')]
print "using .at ", df.at[('a', 'A'), ('c', 'C')]
using .loc 0.37374090276
using .at 0.37374090276
Test speed using .loc
测试速度使用 .loc
%%timeit
df.loc[('a', 'A'), ('c', 'C')]
10000 loops, best of 3: 180 μs per loop
Test speed using .at
测试速度使用 .at
%%timeit
df.at[('a', 'A'), ('c', 'C')]
The slowest run took 6.11 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 8 μs per loop
This looks to be a huge speed increase. Even at the caching stage 6.11 * 8
is a lot faster than 180
这看起来是一个巨大的速度提升。即使在缓存阶段6.11 * 8
也比180
Question
题
What are the limitations of .at
? I'm motivated to use it. The documentation says it's similar to .loc
but it doesn't behave similarly. Example:
有哪些限制.at
?我有动力使用它。文档说它类似于.loc
但它的行为不相似。例子:
# small df
sdf = gdf([lc[:2]], [uc[:2]], seed)
print sdf.loc[:, :]
A B
a 0.444939 0.407554
b 0.460148 0.465239
where as print sdf.at[:, :]
results in TypeError: unhashable type
print sdf.at[:, :]
结果在哪里TypeError: unhashable type
So obviously not the same even if the intent is to be similar.
所以即使意图是相似的,也显然不一样。
That said, who can provide guidance on what can and cannot be done with the .at
method?
也就是说,谁能提供有关该.at
方法可以做什么和不可以做什么的指导?
采纳答案by unutbu
Update: df.get_value
is deprecated as of version 0.21.0. Using df.at
or df.iat
is the recommended method going forward.
更新:df.get_value
从 0.21.0 版开始不推荐使用。使用df.at
或df.iat
是推荐的方法。
df.at
can only access a single value at a time.
df.at
一次只能访问一个值。
df.loc
can select multiple rows and/or columns.
df.loc
可以选择多行和/或多列。
Note that there is also df.get_value
, which may be even quicker at accessing single values:
请注意,还有df.get_value
,它在访问单个值时可能更快:
In [25]: %timeit df.loc[('a', 'A'), ('c', 'C')]
10000 loops, best of 3: 187 μs per loop
In [26]: %timeit df.at[('a', 'A'), ('c', 'C')]
100000 loops, best of 3: 8.33 μs per loop
In [35]: %timeit df.get_value(('a', 'A'), ('c', 'C'))
100000 loops, best of 3: 3.62 μs per loop
Under the hood, df.at[...]
calls df.get_value
, but it also does some type checkingon the keys.
在引擎盖下,df.at[...]
调用df.get_value
,但它也会对键进行一些类型检查。
回答by Cleb
As you asked about the limitations of .at
, here is one thing I recently ran into (using pandas 0.22). Let's use the example from the documentation:
当您询问 的局限性时.at
,这是我最近遇到的一件事(使用 Pandas 0.22)。让我们使用文档中的示例:
df = pd.DataFrame([[0, 2, 3], [0, 4, 1], [10, 20, 30]], index=[4, 5, 6], columns=['A', 'B', 'C'])
df2 = df.copy()
A B C
4 0 2 3
5 0 4 1
6 10 20 30
If I now do
如果我现在做
df.at[4, 'B'] = 100
the result looks as expected
结果看起来如预期
A B C
4 0 100 3
5 0 4 1
6 10 20 30
However, when I try to do
但是,当我尝试做
df.at[4, 'C'] = 10.05
it seems that .at
tries to conserve the datatype (here: int
):
似乎.at
尝试保护的数据类型(这里int
):
A B C
4 0 100 10
5 0 4 1
6 10 20 30
That seems to be a difference to .loc
:
这似乎与以下不同.loc
:
df2.loc[4, 'C'] = 10.05
yields the desired
产生所需的
A B C
4 0 2 10.05
5 0 4 1.00
6 10 20 30.00
The risky thing in the example above is that it happens silently (the conversion from float
to int
). When one tries the same with strings it will throw an error:
上面示例中的风险在于它是静默发生的(从float
到的转换int
)。当一个人对字符串尝试同样的操作时,它会抛出一个错误:
df.at[5, 'A'] = 'a_string'
ValueError: invalid literal for int() with base 10: 'a_string'
ValueError:int() 的无效文字,基数为 10:'a_string'
It will work, however, if one uses a string on which int()
actually works as noted by @n1k31t4 in the comments, e.g.
但是,如果有人使用一个字符串,它会起作用,int()
正如注释中@n1k31t4 所指出的那样,例如
df.at[5, 'A'] = '123'
A B C
4 0 2 3
5 123 4 1
6 10 20 30
回答by Vikranth Inti
.at
is an optimized data access method compared to .loc
.
.at
与.loc
.
.loc
of a data frame selects all the elements located by indexed_rows and labeled_columns as given in its argument. Insetad, .at
selects particular elemnt of a data frame positioned at the given indexed_row and labeled_column.
.loc
数据框的 选择由 indexed_rows 和labeled_columns 定位的所有元素,如其参数中给出的。Insetad,.at
选择位于给定 indexed_row 和 labels_column 的数据框的特定元素。
Also, .at
takes one row and one column as input argument, whereas .loc
may take multiple rows and columns. Oputput using .at
is a single element and using .loc
maybe a Series or a DataFrame.
此外,.at
将一行和一列作为输入参数,而.loc
可以采用多行和多列。使用的输出.at
是单个元素,.loc
可能使用系列或数据帧。