Python 大熊猫 .at 与 .loc

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/37216485/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 19:03:50  来源:igfitidea点击:

pandas .at versus .loc

pythonpandasdataframe

提问by piRSquared

I've been exploring how to optimize my code and ran across pandas.atmethod. Per the documentation

我一直在探索如何优化我的代码并跨越pandas.at方法。根据文档

Fast label-based scalar accessor

Similarly to loc, at provides label based scalar lookups. You can also set using these indexers.

基于标签的快速标量访问器

与 loc 类似,at 提供基于标签的标量查找。您还可以使用这些索引器进行设置。

So I ran some samples:

所以我跑了一些样本:

Setup

设置

import pandas as pd
import numpy as np
from string import letters, lowercase, uppercase

lt = list(letters)
lc = list(lowercase)
uc = list(uppercase)

def gdf(rows, cols, seed=None):
    """rows and cols are what you'd pass
    to pd.MultiIndex.from_product()"""
    gmi = pd.MultiIndex.from_product
    df = pd.DataFrame(index=gmi(rows), columns=gmi(cols))
    np.random.seed(seed)
    df.iloc[:, :] = np.random.rand(*df.shape)
    return df

seed = [3, 1415]
df = gdf([lc, uc], [lc, uc], seed)

print df.head().T.head().T

dflooks like:

df好像:

            a                                        
            A         B         C         D         E
a A  0.444939  0.407554  0.460148  0.465239  0.462691
  B  0.032746  0.485650  0.503892  0.351520  0.061569
  C  0.777350  0.047677  0.250667  0.602878  0.570528
  D  0.927783  0.653868  0.381103  0.959544  0.033253
  E  0.191985  0.304597  0.195106  0.370921  0.631576

Lets use .atand .locand ensure I get the same thing

让我们使用.at.loc确保我得到同样的东西

print "using .loc", df.loc[('a', 'A'), ('c', 'C')]
print "using .at ", df.at[('a', 'A'), ('c', 'C')]

using .loc 0.37374090276
using .at  0.37374090276

Test speed using .loc

测试速度使用 .loc

%%timeit
df.loc[('a', 'A'), ('c', 'C')]

10000 loops, best of 3: 180 μs per loop

Test speed using .at

测试速度使用 .at

%%timeit
df.at[('a', 'A'), ('c', 'C')]

The slowest run took 6.11 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 8 μs per loop

This looks to be a huge speed increase. Even at the caching stage 6.11 * 8is a lot faster than 180

这看起来是一个巨大的速度提升。即使在缓存阶段6.11 * 8也比180

Question

What are the limitations of .at? I'm motivated to use it. The documentation says it's similar to .locbut it doesn't behave similarly. Example:

有哪些限制.at?我有动力使用它。文档说它类似于.loc但它的行为不相似。例子:

# small df
sdf = gdf([lc[:2]], [uc[:2]], seed)

print sdf.loc[:, :]

          A         B
a  0.444939  0.407554
b  0.460148  0.465239

where as print sdf.at[:, :]results in TypeError: unhashable type

print sdf.at[:, :]结果在哪里TypeError: unhashable type

So obviously not the same even if the intent is to be similar.

所以即使意图是相似的,也显然不一样。

That said, who can provide guidance on what can and cannot be done with the .atmethod?

也就是说,谁能提供有关该.at方法可以做什么和不可以做什么的指导?

采纳答案by unutbu

Update: df.get_valueis deprecated as of version 0.21.0. Using df.ator df.iatis the recommended method going forward.

更新:df.get_value从 0.21.0 版开始不推荐使用。使用df.atdf.iat是推荐的方法。



df.atcan only access a single value at a time.

df.at一次只能访问一个值。

df.loccan select multiple rows and/or columns.

df.loc可以选择多行和/或多列。

Note that there is also df.get_value, which may be even quicker at accessing single values:

请注意,还有df.get_value,它在访问单个值时可能更快:

In [25]: %timeit df.loc[('a', 'A'), ('c', 'C')]
10000 loops, best of 3: 187 μs per loop

In [26]: %timeit df.at[('a', 'A'), ('c', 'C')]
100000 loops, best of 3: 8.33 μs per loop

In [35]: %timeit df.get_value(('a', 'A'), ('c', 'C'))
100000 loops, best of 3: 3.62 μs per loop


Under the hood, df.at[...]calls df.get_value, but it also does some type checkingon the keys.

在引擎盖下,df.at[...]调用df.get_value,但它也会对键进行一些类型检查

回答by Cleb

As you asked about the limitations of .at, here is one thing I recently ran into (using pandas 0.22). Let's use the example from the documentation:

当您询问 的局限性时.at,这是我最近遇到的一件事(使用 Pandas 0.22)。让我们使用文档中的示例:

df = pd.DataFrame([[0, 2, 3], [0, 4, 1], [10, 20, 30]], index=[4, 5, 6], columns=['A', 'B', 'C'])
df2 = df.copy()

    A   B   C
4   0   2   3
5   0   4   1
6  10  20  30

If I now do

如果我现在做

df.at[4, 'B'] = 100

the result looks as expected

结果看起来如预期

    A    B   C
4   0  100   3
5   0    4   1
6  10   20  30

However, when I try to do

但是,当我尝试做

 df.at[4, 'C'] = 10.05

it seems that .attries to conserve the datatype (here: int):

似乎.at尝试保护的数据类型(这里int

    A    B   C
4   0  100  10
5   0    4   1
6  10   20  30

That seems to be a difference to .loc:

这似乎与以下不同.loc

df2.loc[4, 'C'] = 10.05

yields the desired

产生所需的

    A   B      C
4   0   2  10.05
5   0   4   1.00
6  10  20  30.00

The risky thing in the example above is that it happens silently (the conversion from floatto int). When one tries the same with strings it will throw an error:

上面示例中的风险在于它是静默发生的(从float到的转换int)。当一个人对字符串尝试同样的操作时,它会抛出一个错误:

df.at[5, 'A'] = 'a_string'

ValueError: invalid literal for int() with base 10: 'a_string'

ValueError:int() 的无效文字,基数为 10:'a_string'

It will work, however, if one uses a string on which int()actually works as noted by @n1k31t4 in the comments, e.g.

但是,如果有人使用一个字符串,它会起作用,int()正如注释中@n1k31t4 所指出的那样,例如

df.at[5, 'A'] = '123'

     A   B   C
4    0   2   3
5  123   4   1
6   10  20  30

回答by Vikranth Inti

.atis an optimized data access method compared to .loc.

.at.loc.

.locof a data frame selects all the elements located by indexed_rows and labeled_columns as given in its argument. Insetad, .atselects particular elemnt of a data frame positioned at the given indexed_row and labeled_column.

.loc数据框的 选择由 indexed_rows 和labeled_columns 定位的所有元素,如其参数中给出的。Insetad,.at选择位于给定 indexed_row 和 labels_column 的数据框的特定元素。

Also, .attakes one row and one column as input argument, whereas .locmay take multiple rows and columns. Oputput using .atis a single element and using .locmaybe a Series or a DataFrame.

此外,.at将一行和一列作为输入参数,而.loc可以采用多行和多列。使用的输出.at是单个元素,.loc可能使用系列或数据帧。