Python pandas loc vs. iloc vs. ix vs. at vs. iat?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/28757389/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 03:41:54  来源:igfitidea点击:

pandas loc vs. iloc vs. ix vs. at vs. iat?

pythonpandasperformanceindexinglookup

提问by scribbles

Recently began branching out from my safe place (R) into Python and and am a bit confused by the cell localization/selection in Pandas. I've read the documentation but I'm struggling to understand the practical implications of the various localization/selection options.

最近开始从我的安全位置 (R) 扩展到 Python,并且对Pandas. 我已经阅读了文档,但我很难理解各种本地化/选择选项的实际含义。

  • Is there a reason why I should ever use .locor .ilocover the most general option .ix?
  • I understand that .loc, iloc, at, and iatmay provide some guaranteed correctness that .ixcan't offer, but I've also read where .ixtends to be the fastest solution across the board.
  • Please explain the real-world, best-practices reasoning behind utilizing anything other than .ix?
  • 有什么理由让我永远使用.loc.iloc超过最通用的选项.ix吗?
  • 我了解.loc, iloc, at, 并且iat可能提供一些.ix无法提供的有保证的正确性,但我也读过哪里.ix往往是最快的解决方案。
  • 请解释使用除.ix?

采纳答案by lautremont

loc:only work on index
iloc:work on position
ix:You can get data from dataframe without it being in the index
at:get scalar values. It's a very fast loc
iat:Get scalar values. It's a very fast iloc

loc:仅适用于索引
iloc:适用于位置
ix:您可以从数据框中获取数据而无需在索引
获取数据获取标量值。这是一个非常快的定位
获取标量值。这是一个非常快的 iloc

http://pyciencia.blogspot.com/2015/05/obtener-y-filtrar-datos-de-un-dataframe.html

http://pyciencia.blogspot.com/2015/05/obtener-y-filtrar-datos-de-un-dataframe.html

Note:As of pandas 0.20.0, the .ixindexer is deprecatedin favour of the more strict .ilocand .locindexers.

注:由于pandas 0.20.0中,.ix索引被弃用赞成更加严格.iloc.loc索引。

回答by Lydia

df = pd.DataFrame({'A':['a', 'b', 'c'], 'B':[54, 67, 89]}, index=[100, 200, 300])

df

                        A   B
                100     a   54
                200     b   67
                300     c   89
In [19]:    
df.loc[100]

Out[19]:
A     a
B    54
Name: 100, dtype: object

In [20]:    
df.iloc[0]

Out[20]:
A     a
B    54
Name: 100, dtype: object

In [24]:    
df2 = df.set_index([df.index,'A'])
df2

Out[24]:
        B
    A   
100 a   54
200 b   67
300 c   89

In [25]:    
df2.ix[100, 'a']

Out[25]:    
B    54
Name: (100, a), dtype: int64

回答by piRSquared

Updated for pandas0.20given that ixis deprecated. This demonstrates not only how to use loc, iloc, at, iat, set_value, but how to accomplish, mixed positional/label based indexing.

更新了pandas0.20,鉴于ix已被弃用。这不仅演示了如何使用loc, iloc, at, iat, set_value,还演示了如何完成基于位置/标签的混合索引。



loc- label based
Allows you to pass 1-D arrays as indexers. Arrays can be either slices (subsets) of the index or column, or they can be boolean arrays which are equal in length to the index or columns.

loc-基于标签
允许您将一维数组作为索引传递。数组可以是索引或列的切片(子集),也可以是长度与索引或列相等的布尔数组。

Special Note:when a scalar indexer is passed, loccan assign a new index or column value that didn't exist before.

特别注意:当传递标量索引器时,loc可以分配一个以前不存在的新索引或列值。

# label based, but we can use position values
# to get the labels from the index object
df.loc[df.index[2], 'ColName'] = 3


df.loc[df.index[1:3], 'ColName'] = 3


iloc- position based
Similar to locexcept with positions rather that index values. However, you cannotassign new columns or indices.

iloc-基于位置
类似于loc除了位置而不是索引值。但是,您不能分配新的列或索引。

# position based, but we can get the position
# from the columns object via the `get_loc` method
df.iloc[2, df.columns.get_loc('ColName')] = 3


df.iloc[2, 4] = 3


df.iloc[:3, 2:4] = 3


at- label based
Works very similar to locfor scalar indexers. Cannotoperate on array indexers. Can!assign new indices and columns.

at-基于标签的
工作非常类似于loc标量索引器。 无法对数组索引器进行操作。 能!分配新的索引和列。

Advantageover locis that this is faster.
Disadvantageis that you can't use arrays for indexers.

优势loc是,这是速度更快。
缺点是不能将数组用于索引器。

# label based, but we can use position values
# to get the labels from the index object
df.at[df.index[2], 'ColName'] = 3


df.at['C', 'ColName'] = 3


iat- position based
Works similarly to iloc. Cannotwork in array indexers. Cannot!assign new indices and columns.

iat-基于位置的
工作类似于iloc. 不能在数组索引器中工作。 不能!分配新的索引和列。

Advantageover ilocis that this is faster.
Disadvantageis that you can't use arrays for indexers.

优势iloc是,这是速度更快。
缺点是不能将数组用于索引器。

# position based, but we can get the position
# from the columns object via the `get_loc` method
IBM.iat[2, IBM.columns.get_loc('PNL')] = 3


set_value- label based
Works very similar to locfor scalar indexers. Cannotoperate on array indexers. Can!assign new indices and columns

set_value-基于标签的
工作非常类似于loc标量索引器。 无法对数组索引器进行操作。 能!分配新的索引和列

AdvantageSuper fast, because there is very little overhead!
DisadvantageThere is very little overhead because pandasis not doing a bunch of safety checks. Use at your own risk. Also, this is not intended for public use.

优势超快,因为开销很小!
缺点因为pandas没有做一堆安全检查,所以开销很小。 使用风险自负。此外,这不适合公众使用。

# label based, but we can use position values
# to get the labels from the index object
df.set_value(df.index[2], 'ColName', 3)


set_valuewith takable=True- position based
Works similarly to iloc. Cannotwork in array indexers. Cannot!assign new indices and columns.

set_valuewithtakable=True-position based
iloc. 不能在数组索引器中工作。 不能!分配新的索引和列。

AdvantageSuper fast, because there is very little overhead!
DisadvantageThere is very little overhead because pandasis not doing a bunch of safety checks. Use at your own risk. Also, this is not intended for public use.

优势超快,因为开销很小!
缺点因为pandas没有做一堆安全检查,所以开销很小。 使用风险自负。此外,这不适合公众使用。

# position based, but we can get the position
# from the columns object via the `get_loc` method
df.set_value(2, df.columns.get_loc('ColName'), 3, takable=True)

回答by Ted Petrou

There are two primary ways that pandas makes selections from a DataFrame.

pandas 从 DataFrame 中进行选择有两种主要方式。

  • By Label
  • By Integer Location
  • 标签
  • 整数位置

The documentation uses the term positionfor referring to integer location. I do not like this terminology as I feel it is confusing. Integer location is more descriptive and is exactly what .ilocstands for. The key word here is INTEGER- you must use integers when selecting by integer location.

该文档使用术语position来表示整数位置。我不喜欢这个术语,因为我觉得它很混乱。整数位置更具描述性,正是.iloc代表的意思。这里的关键词是INTEGER- 在按整数位置选择时必须使用整数。

Before showing the summary let's all make sure that ...

在显示摘要之前,让我们确保...

.ix is deprecated and ambiguous and should never be used

.ix 已弃用且不明确,永远不应使用

There are three primary indexersfor pandas. We have the indexing operator itself (the brackets []), .loc, and .iloc. Let's summarize them:

熊猫有三个主要索引器。我们有索引运算符本身(括号[].loc、 和.iloc。让我们总结一下:

  • []- Primarily selects subsets of columns, but can select rows as well. Cannot simultaneously select rows and columns.
  • .loc- selects subsets of rows and columns by label only
  • .iloc- selects subsets of rows and columns by integer location only
  • []- 主要选择列的子集,但也可以选择行。不能同时选择行和列。
  • .loc- 仅按标签选择行和列的子集
  • .iloc- 仅按整数位置选择行和列的子集

I almost never use .ator .iatas they add no additional functionality and with just a small performance increase. I would discourage their use unless you have a very time-sensitive application. Regardless, we have their summary:

我几乎从不使用.ator.iat因为它们没有添加额外的功能,而且性能只有很小的提升。除非你有一个对时间非常敏感的应用程序,否则我会劝阻它们的使用。无论如何,我们有他们的总结:

  • .atselects a single scalar value in the DataFrame by label only
  • .iatselects a single scalar value in the DataFrame by integer location only
  • .at仅通过标签选择 DataFrame 中的单个标量值
  • .iat仅通过整数位置选择 DataFrame 中的单个标量值

In addition to selection by label and integer location, boolean selectionalso known as boolean indexingexists.

除了按标签和整数位置选择之外,还存在布尔选择,也称为布尔索引



Examples explaining .loc, .iloc, boolean selection and .atand .iatare shown below

实施例说明.loc.iloc,布尔选择和.at.iat如下所示

We will first focus on the differences between .locand .iloc. Before we talk about the differences, it is important to understand that DataFrames have labels that help identify each column and each row. Let's take a look at a sample DataFrame:

我们将首先关注.loc和之间的差异.iloc。在我们讨论差异之前,重要的是要了解 DataFrame 具有有助于识别每一列和每一行的标签。让我们看一个示例 DataFrame:

df = pd.DataFrame({'age':[30, 2, 12, 4, 32, 33, 69],
                   'color':['blue', 'green', 'red', 'white', 'gray', 'black', 'red'],
                   'food':['Steak', 'Lamb', 'Mango', 'Apple', 'Cheese', 'Melon', 'Beans'],
                   'height':[165, 70, 120, 80, 180, 172, 150],
                   'score':[4.6, 8.3, 9.0, 3.3, 1.8, 9.5, 2.2],
                   'state':['NY', 'TX', 'FL', 'AL', 'AK', 'TX', 'TX']
                   },
                  index=['Jane', 'Nick', 'Aaron', 'Penelope', 'Dean', 'Christina', 'Cornelia'])

enter image description here

在此处输入图片说明

All the words in boldare the labels. The labels, age, color, food, height, scoreand stateare used for the columns. The other labels, Jane, Nick, Aaron, Penelope, Dean, Christina, Corneliaare used as labels for the rows. Collectively, these row labels are known as the index.

所有粗体字都是标签。标签,agecolorfoodheightscorestate被用于。其他标签,JaneNickAaronPenelopeDeanChristinaCornelia用作标签的行。这些行标签统称为index



The primary ways to select particular rows in a DataFrame are with the .locand .ilocindexers. Each of these indexers can also be used to simultaneously select columns but it is easier to just focus on rows for now. Also, each of the indexers use a set of brackets that immediately follow their name to make their selections.

在 DataFrame 中选择特定行的主要方法是使用.loc.iloc索引器。这些索引器中的每一个也可用于同时选择列,但现在更容易只关注行。此外,每个索引器都使用一组紧跟其名称的括号来进行选择。

.loc selects data only by labels

.loc 仅通过标签选择数据

We will first talk about the .locindexer which only selects data by the index or column labels. In our sample DataFrame, we have provided meaningful names as values for the index. Many DataFrames will not have any meaningful names and will instead, default to just the integers from 0 to n-1, where n is the length(number of rows) of the DataFrame.

我们将首先讨论.loc仅通过索引或列标签选择数据的索引器。在我们的示例 DataFrame 中,我们提供了有意义的名称作为索引的值。许多 DataFrame 没有任何有意义的名称,而是默认为从 0 到 n-1 的整数,其中 n 是 DataFrame 的长度(行数)。

There are many different inputsyou can use for .locthree out of them are

许多不同的输入,你可以用.loc四分之三都是

  • A string
  • A list of strings
  • Slice notation using strings as the start and stop values
  • 一个字符串
  • 字符串列表
  • 使用字符串作为起始值和终止值的切片符号

Selecting a single row with .loc with a string

使用带有字符串的 .loc 选择单行

To select a single row of data, place the index label inside of the brackets following .loc.

要选择单行数据,请将索引标签放在后面的括号内.loc

df.loc['Penelope']

This returns the row of data as a Series

这将数据行作为系列返回

age           4
color     white
food      Apple
height       80
score       3.3
state        AL
Name: Penelope, dtype: object

Selecting multiple rows with .loc with a list of strings

使用带有字符串列表的 .loc 选择多行

df.loc[['Cornelia', 'Jane', 'Dean']]

This returns a DataFrame with the rows in the order specified in the list:

这将返回一个 DataFrame,其中的行按列表中指定的顺序排列:

enter image description here

在此处输入图片说明

Selecting multiple rows with .loc with slice notation

使用带有切片符号的 .loc 选择多行

Slice notation is defined by a start, stop and step values. When slicing by label, pandas includes the stop value in the return. The following slices from Aaron to Dean, inclusive. Its step size is not explicitly defined but defaulted to 1.

切片符号由开始、停止和步长值定义。按标签切片时,pandas 在返回值中包含停止值。以下从 Aaron 到 Dean 的切片,包括在内。它的步长没有明确定义,但默认为 1。

df.loc['Aaron':'Dean']

enter image description here

在此处输入图片说明

Complex slices can be taken in the same manner as Python lists.

可以采用与 Python 列表相同的方式获取复杂切片。

.iloc selects data only by integer location

.iloc 仅按整数位置选择数据

Let's now turn to .iloc. Every row and column of data in a DataFrame has an integer location that defines it. This is in addition to the label that is visually displayed in the output. The integer location is simply the number of rows/columns from the top/left beginning at 0.

现在让我们转向.iloc. DataFrame 中的每一行和每一列数据都有一个定义它的整数位置。这是对输出中直观显示的标签的补充。整数位置只是从顶部/左侧开始的行/列数,从 0 开始。

There are many different inputsyou can use for .ilocthree out of them are

许多不同的输入,你可以用.iloc四分之三都是

  • An integer
  • A list of integers
  • Slice notation using integers as the start and stop values
  • 一个整数
  • 整数列表
  • 使用整数作为起始值和终止值的切片符号

Selecting a single row with .iloc with an integer

使用带有整数的 .iloc 选择单行

df.iloc[4]

This returns the 5th row (integer location 4) as a Series

这将作为系列返回第 5 行(整数位置 4)

age           32
color       gray
food      Cheese
height       180
score        1.8
state         AK
Name: Dean, dtype: object

Selecting multiple rows with .iloc with a list of integers

使用带有整数列表的 .iloc 选择多行

df.iloc[[2, -2]]

This returns a DataFrame of the third and second to last rows:

这将返回第三行和倒数第二行的 DataFrame:

enter image description here

在此处输入图片说明

Selecting multiple rows with .iloc with slice notation

使用带有切片符号的 .iloc 选择多行

df.iloc[:5:3]

enter image description here

在此处输入图片说明



Simultaneous selection of rows and columns with .loc and .iloc

使用 .loc 和 .iloc 同时选择行和列

One excellent ability of both .loc/.ilocis their ability to select both rows and columns simultaneously. In the examples above, all the columns were returned from each selection. We can choose columns with the same types of inputs as we do for rows. We simply need to separate the row and column selection with a comma.

两者的一项出色能力是同时.loc/.iloc选择行和列的能力。在上面的例子中,所有的列都是从每个选择中返回的。我们可以选择输入类型与行相同的列。我们只需要用逗号分隔行和列选择。

For example, we can select rows Jane, and Dean with just the columns height, score and state like this:

例如,我们可以选择行 Jane 和 Dean,其中只有列的高度、分数和状态,如下所示:

df.loc[['Jane', 'Dean'], 'height':]

enter image description here

在此处输入图片说明

This uses a list of labels for the rows and slice notation for the columns

这使用行的标签列表和列的切片符号

We can naturally do similar operations with .ilocusing only integers.

我们自然可以.iloc只使用整数来进行类似的操作。

df.iloc[[1,4], 2]
Nick      Lamb
Dean    Cheese
Name: food, dtype: object


Simultaneous selection with labels and integer location

同时选择标签和整数位置

.ixwas used to make selections simultaneously with labels and integer location which was useful but confusing and ambiguous at times and thankfully it has been deprecated. In the event that you need to make a selection with a mix of labels and integer locations, you will have to make both your selections labels or integer locations.

.ix用于与标签和整数位置同时进行选择,这很有用,但有时令人困惑和模棱两可,幸运的是它已被弃用。如果您需要使用标签和整数位置的混合进行选择,则必须同时进行选择标签或整数位置。

For instance, if we want to select rows Nickand Corneliaalong with columns 2 and 4, we could use .locby converting the integers to labels with the following:

例如,如果我们想选择行Nick以及第Cornelia2 列和第 4 列,我们可以.loc通过将整数转换为标签来使用以下内容:

col_names = df.columns[[2, 4]]
df.loc[['Nick', 'Cornelia'], col_names] 

Or alternatively, convert the index labels to integers with the get_locindex method.

或者,使用get_locindex 方法将索引标签转换为整数。

labels = ['Nick', 'Cornelia']
index_ints = [df.index.get_loc(label) for label in labels]
df.iloc[index_ints, [2, 4]]

Boolean Selection

布尔选择

The .loc indexer can also do boolean selection. For instance, if we are interested in finding all the rows where age is above 30 and return just the foodand scorecolumns we can do the following:

.loc 索引器也可以进行布尔选择。例如,如果我们有兴趣查找年龄大于 30 的所有行并仅返回foodscore列,我们可以执行以下操作:

df.loc[df['age'] > 30, ['food', 'score']] 

You can replicate this with .ilocbut you cannot pass it a boolean series. You must convert the boolean Series into a numpy array like this:

您可以使用 with 复制它,.iloc但不能将它传递给布尔系列。您必须将布尔系列转换为这样的 numpy 数组:

df.iloc[(df['age'] > 30).values, [2, 4]] 


Selecting all rows

选择所有行

It is possible to use .loc/.ilocfor just column selection. You can select all the rows by using a colon like this:

可以.loc/.iloc仅用于列选择。您可以使用这样的冒号来选择所有行:

df.loc[:, 'color':'score':2]

enter image description here

在此处输入图片说明



The indexing operator, [], can slice can select rows and columns too but not simultaneously.

索引运算符 , []can slice 也可以选择行和列,但不能同时选择。

Most people are familiar with the primary purpose of the DataFrame indexing operator, which is to select columns. A string selects a single column as a Series and a list of strings selects multiple columns as a DataFrame.

大多数人都熟悉 DataFrame 索引运算符的主要用途,即选择列。字符串选择单列作为系列,字符串列表选择多列作为数据帧。

df['food']

Jane          Steak
Nick           Lamb
Aaron         Mango
Penelope      Apple
Dean         Cheese
Christina     Melon
Cornelia      Beans
Name: food, dtype: object

Using a list selects multiple columns

使用列表选择多列

df[['food', 'score']]

enter image description here

在此处输入图片说明

What people are less familiar with, is that, when slice notation is used, then selection happens by row labels or by integer location. This is very confusing and something that I almost never use but it does work.

人们不太熟悉的是,当使用切片符号时,选择是通过行标签或整数位置进行的。这非常令人困惑,而且我几乎从未使用过,但它确实有效。

df['Penelope':'Christina'] # slice rows by label

enter image description here

在此处输入图片说明

df[2:6:2] # slice rows by integer location

enter image description here

在此处输入图片说明

The explicitness of .loc/.ilocfor selecting rows is highly preferred. The indexing operator alone is unable to select rows and columns simultaneously.

.loc/.iloc选择行的明确性是非常受欢迎的。单独的索引运算符无法同时选择行和列。

df[3:5, 'color']
TypeError: unhashable type: 'slice'


Selection by .atand .iat

.at和选择.iat

Selection with .atis nearly identical to .locbut it only selects a single 'cell' in your DataFrame. We usually refer to this cell as a scalar value. To use .at, pass it both a row and column label separated by a comma.

选择与.at几乎相同,.loc但它只选择您的 DataFrame 中的单个“单元格”。我们通常将此单元格称为标量值。要使用.at,请同时传递以逗号分隔的行和列标签。

df.at['Christina', 'color']
'black'

Selection with .iatis nearly identical to .ilocbut it only selects a single scalar value. You must pass it an integer for both the row and column locations

选择与.iat几乎相同,.iloc但它只选择一个标量值。您必须为行和列位置传递一个整数

df.iat[2, 5]
'FL'

回答by Fabio Pomi

Let's start with this small df:

让我们从这个小 df 开始:

import pandas as pd
import time as tm
import numpy as np
n=10
a=np.arange(0,n**2)
df=pd.DataFrame(a.reshape(n,n))

We'll so have

我们会有

df
Out[25]: 
        0   1   2   3   4   5   6   7   8   9
    0   0   1   2   3   4   5   6   7   8   9
    1  10  11  12  13  14  15  16  17  18  19
    2  20  21  22  23  24  25  26  27  28  29
    3  30  31  32  33  34  35  36  37  38  39
    4  40  41  42  43  44  45  46  47  48  49
    5  50  51  52  53  54  55  56  57  58  59
    6  60  61  62  63  64  65  66  67  68  69
    7  70  71  72  73  74  75  76  77  78  79
    8  80  81  82  83  84  85  86  87  88  89
    9  90  91  92  93  94  95  96  97  98  99

With this we have:

有了这个,我们有:

df.iloc[3,3]
Out[33]: 33

df.iat[3,3]
Out[34]: 33

df.iloc[:3,:3]
Out[35]: 
    0   1   2   3
0   0   1   2   3
1  10  11  12  13
2  20  21  22  23
3  30  31  32  33



df.iat[:3,:3]
Traceback (most recent call last):
   ... omissis ...
ValueError: At based indexing on an integer index can only have integer indexers

Thus we cannot use .iat for subset, where we must use .iloc only.

因此,我们不能将 .iat 用于子集,而必须仅使用 .iloc。

But let's try both to select from a larger df and let's check the speed ...

但是让我们尝试从更大的 df 中进行选择,然后检查速度......

# -*- coding: utf-8 -*-
"""
Created on Wed Feb  7 09:58:39 2018

@author: Fabio Pomi
"""

import pandas as pd
import time as tm
import numpy as np
n=1000
a=np.arange(0,n**2)
df=pd.DataFrame(a.reshape(n,n))
t1=tm.time()
for j in df.index:
    for i in df.columns:
        a=df.iloc[j,i]
t2=tm.time()
for j in df.index:
    for i in df.columns:
        a=df.iat[j,i]
t3=tm.time()
loc=t2-t1
at=t3-t2
prc = loc/at *100
print('\nloc:%f at:%f prc:%f' %(loc,at,prc))

loc:10.485600 at:7.395423 prc:141.784987

So with .loc we can manage subsets and with .at only a single scalar, but .at is faster than .loc

所以使用 .loc 我们可以管理子集,使用 .at 只能管理一个标量,但 .at 比 .loc 快

:-)

:-)