Python pandas loc vs. iloc vs. ix vs. at vs. iat?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/28757389/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
pandas loc vs. iloc vs. ix vs. at vs. iat?
提问by scribbles
Recently began branching out from my safe place (R) into Python and and am a bit confused by the cell localization/selection in Pandas
. I've read the documentation but I'm struggling to understand the practical implications of the various localization/selection options.
最近开始从我的安全位置 (R) 扩展到 Python,并且对Pandas
. 我已经阅读了文档,但我很难理解各种本地化/选择选项的实际含义。
- Is there a reason why I should ever use
.loc
or.iloc
over the most general option.ix
? - I understand that
.loc
,iloc
,at
, andiat
may provide some guaranteed correctness that.ix
can't offer, but I've also read where.ix
tends to be the fastest solution across the board. - Please explain the real-world, best-practices reasoning behind utilizing anything other than
.ix
?
- 有什么理由让我永远使用
.loc
或.iloc
超过最通用的选项.ix
吗? - 我了解
.loc
,iloc
,at
, 并且iat
可能提供一些.ix
无法提供的有保证的正确性,但我也读过哪里.ix
往往是最快的解决方案。 - 请解释使用除
.ix
?
采纳答案by lautremont
loc:only work on index
iloc:work on position
ix:You can get data from dataframe without it being in the index
at:get scalar values. It's a very fast loc
iat:Get scalar values. It's a very fast iloc
loc:仅适用于索引
iloc:适用于位置
ix:您可以从数据框中获取数据而无需在索引
中获取数据:获取标量值。这是一个非常快的定位
:获取标量值。这是一个非常快的 iloc
http://pyciencia.blogspot.com/2015/05/obtener-y-filtrar-datos-de-un-dataframe.html
http://pyciencia.blogspot.com/2015/05/obtener-y-filtrar-datos-de-un-dataframe.html
Note:As of pandas 0.20.0
, the .ix
indexer is deprecatedin favour of the more strict .iloc
and .loc
indexers.
注:由于pandas 0.20.0
中,.ix
索引被弃用赞成更加严格.iloc
和.loc
索引。
回答by Lydia
df = pd.DataFrame({'A':['a', 'b', 'c'], 'B':[54, 67, 89]}, index=[100, 200, 300])
df
A B
100 a 54
200 b 67
300 c 89
In [19]:
df.loc[100]
Out[19]:
A a
B 54
Name: 100, dtype: object
In [20]:
df.iloc[0]
Out[20]:
A a
B 54
Name: 100, dtype: object
In [24]:
df2 = df.set_index([df.index,'A'])
df2
Out[24]:
B
A
100 a 54
200 b 67
300 c 89
In [25]:
df2.ix[100, 'a']
Out[25]:
B 54
Name: (100, a), dtype: int64
回答by piRSquared
Updated for pandas
0.20
given that ix
is deprecated. This demonstrates not only how to use loc
, iloc
, at
, iat
, set_value
, but how to accomplish, mixed positional/label based indexing.
更新了pandas
0.20
,鉴于ix
已被弃用。这不仅演示了如何使用loc
, iloc
, at
, iat
, set_value
,还演示了如何完成基于位置/标签的混合索引。
loc
- label based
Allows you to pass 1-D arrays as indexers. Arrays can be either slices (subsets) of the index or column, or they can be boolean arrays which are equal in length to the index or columns.
loc
-基于标签
允许您将一维数组作为索引传递。数组可以是索引或列的切片(子集),也可以是长度与索引或列相等的布尔数组。
Special Note:when a scalar indexer is passed, loc
can assign a new index or column value that didn't exist before.
特别注意:当传递标量索引器时,loc
可以分配一个以前不存在的新索引或列值。
# label based, but we can use position values
# to get the labels from the index object
df.loc[df.index[2], 'ColName'] = 3
df.loc[df.index[1:3], 'ColName'] = 3
iloc
- position based
Similar to loc
except with positions rather that index values. However, you cannotassign new columns or indices.
iloc
-基于位置
类似于loc
除了位置而不是索引值。但是,您不能分配新的列或索引。
# position based, but we can get the position
# from the columns object via the `get_loc` method
df.iloc[2, df.columns.get_loc('ColName')] = 3
df.iloc[2, 4] = 3
df.iloc[:3, 2:4] = 3
at
- label based
Works very similar to loc
for scalar indexers. Cannotoperate on array indexers. Can!assign new indices and columns.
at
-基于标签的
工作非常类似于loc
标量索引器。 无法对数组索引器进行操作。 能!分配新的索引和列。
Advantageover loc
is that this is faster.
Disadvantageis that you can't use arrays for indexers.
优势比loc
是,这是速度更快。
缺点是不能将数组用于索引器。
# label based, but we can use position values
# to get the labels from the index object
df.at[df.index[2], 'ColName'] = 3
df.at['C', 'ColName'] = 3
iat
- position based
Works similarly to iloc
. Cannotwork in array indexers. Cannot!assign new indices and columns.
iat
-基于位置的
工作类似于iloc
. 不能在数组索引器中工作。 不能!分配新的索引和列。
Advantageover iloc
is that this is faster.
Disadvantageis that you can't use arrays for indexers.
优势比iloc
是,这是速度更快。
缺点是不能将数组用于索引器。
# position based, but we can get the position
# from the columns object via the `get_loc` method
IBM.iat[2, IBM.columns.get_loc('PNL')] = 3
set_value
- label based
Works very similar to loc
for scalar indexers. Cannotoperate on array indexers. Can!assign new indices and columns
set_value
-基于标签的
工作非常类似于loc
标量索引器。 无法对数组索引器进行操作。 能!分配新的索引和列
AdvantageSuper fast, because there is very little overhead!
DisadvantageThere is very little overhead because pandas
is not doing a bunch of safety checks. Use at your own risk. Also, this is not intended for public use.
优势超快,因为开销很小!
缺点因为pandas
没有做一堆安全检查,所以开销很小。 使用风险自负。此外,这不适合公众使用。
# label based, but we can use position values
# to get the labels from the index object
df.set_value(df.index[2], 'ColName', 3)
set_value
with takable=True
- position based
Works similarly to iloc
. Cannotwork in array indexers. Cannot!assign new indices and columns.
set_value
withtakable=True
-position based
与iloc
. 不能在数组索引器中工作。 不能!分配新的索引和列。
AdvantageSuper fast, because there is very little overhead!
DisadvantageThere is very little overhead because pandas
is not doing a bunch of safety checks. Use at your own risk. Also, this is not intended for public use.
优势超快,因为开销很小!
缺点因为pandas
没有做一堆安全检查,所以开销很小。 使用风险自负。此外,这不适合公众使用。
# position based, but we can get the position
# from the columns object via the `get_loc` method
df.set_value(2, df.columns.get_loc('ColName'), 3, takable=True)
回答by Ted Petrou
There are two primary ways that pandas makes selections from a DataFrame.
pandas 从 DataFrame 中进行选择有两种主要方式。
- By Label
- By Integer Location
- 按标签
- 按整数位置
The documentation uses the term positionfor referring to integer location. I do not like this terminology as I feel it is confusing. Integer location is more descriptive and is exactly what .iloc
stands for. The key word here is INTEGER- you must use integers when selecting by integer location.
该文档使用术语position来表示整数位置。我不喜欢这个术语,因为我觉得它很混乱。整数位置更具描述性,正是.iloc
代表的意思。这里的关键词是INTEGER- 在按整数位置选择时必须使用整数。
Before showing the summary let's all make sure that ...
在显示摘要之前,让我们确保...
.ix is deprecated and ambiguous and should never be used
.ix 已弃用且不明确,永远不应使用
There are three primary indexersfor pandas. We have the indexing operator itself (the brackets []
), .loc
, and .iloc
. Let's summarize them:
熊猫有三个主要索引器。我们有索引运算符本身(括号[]
).loc
、 和.iloc
。让我们总结一下:
[]
- Primarily selects subsets of columns, but can select rows as well. Cannot simultaneously select rows and columns..loc
- selects subsets of rows and columns by label only.iloc
- selects subsets of rows and columns by integer location only
[]
- 主要选择列的子集,但也可以选择行。不能同时选择行和列。.loc
- 仅按标签选择行和列的子集.iloc
- 仅按整数位置选择行和列的子集
I almost never use .at
or .iat
as they add no additional functionality and with just a small performance increase. I would discourage their use unless you have a very time-sensitive application. Regardless, we have their summary:
我几乎从不使用.at
or.iat
因为它们没有添加额外的功能,而且性能只有很小的提升。除非你有一个对时间非常敏感的应用程序,否则我会劝阻它们的使用。无论如何,我们有他们的总结:
.at
selects a single scalar value in the DataFrame by label only.iat
selects a single scalar value in the DataFrame by integer location only
.at
仅通过标签选择 DataFrame 中的单个标量值.iat
仅通过整数位置选择 DataFrame 中的单个标量值
In addition to selection by label and integer location, boolean selectionalso known as boolean indexingexists.
除了按标签和整数位置选择之外,还存在布尔选择,也称为布尔索引。
Examples explaining .loc
, .iloc
, boolean selection and .at
and .iat
are shown below
实施例说明.loc
,.iloc
,布尔选择和.at
与.iat
如下所示
We will first focus on the differences between .loc
and .iloc
. Before we talk about the differences, it is important to understand that DataFrames have labels that help identify each column and each row. Let's take a look at a sample DataFrame:
我们将首先关注.loc
和之间的差异.iloc
。在我们讨论差异之前,重要的是要了解 DataFrame 具有有助于识别每一列和每一行的标签。让我们看一个示例 DataFrame:
df = pd.DataFrame({'age':[30, 2, 12, 4, 32, 33, 69],
'color':['blue', 'green', 'red', 'white', 'gray', 'black', 'red'],
'food':['Steak', 'Lamb', 'Mango', 'Apple', 'Cheese', 'Melon', 'Beans'],
'height':[165, 70, 120, 80, 180, 172, 150],
'score':[4.6, 8.3, 9.0, 3.3, 1.8, 9.5, 2.2],
'state':['NY', 'TX', 'FL', 'AL', 'AK', 'TX', 'TX']
},
index=['Jane', 'Nick', 'Aaron', 'Penelope', 'Dean', 'Christina', 'Cornelia'])
All the words in boldare the labels. The labels, age
, color
, food
, height
, score
and state
are used for the columns. The other labels, Jane
, Nick
, Aaron
, Penelope
, Dean
, Christina
, Cornelia
are used as labels for the rows. Collectively, these row labels are known as the index.
所有粗体字都是标签。标签,age
,color
,food
,height
,score
和state
被用于列。其他标签,Jane
,Nick
,Aaron
,Penelope
,Dean
,Christina
,Cornelia
用作标签的行。这些行标签统称为index。
The primary ways to select particular rows in a DataFrame are with the .loc
and .iloc
indexers. Each of these indexers can also be used to simultaneously select columns but it is easier to just focus on rows for now. Also, each of the indexers use a set of brackets that immediately follow their name to make their selections.
在 DataFrame 中选择特定行的主要方法是使用.loc
和.iloc
索引器。这些索引器中的每一个也可用于同时选择列,但现在更容易只关注行。此外,每个索引器都使用一组紧跟其名称的括号来进行选择。
.loc selects data only by labels
.loc 仅通过标签选择数据
We will first talk about the .loc
indexer which only selects data by the index or column labels. In our sample DataFrame, we have provided meaningful names as values for the index. Many DataFrames will not have any meaningful names and will instead, default to just the integers from 0 to n-1, where n is the length(number of rows) of the DataFrame.
我们将首先讨论.loc
仅通过索引或列标签选择数据的索引器。在我们的示例 DataFrame 中,我们提供了有意义的名称作为索引的值。许多 DataFrame 没有任何有意义的名称,而是默认为从 0 到 n-1 的整数,其中 n 是 DataFrame 的长度(行数)。
There are many different inputsyou can use for .loc
three out of them are
有许多不同的输入,你可以用.loc
四分之三都是
- A string
- A list of strings
- Slice notation using strings as the start and stop values
- 一个字符串
- 字符串列表
- 使用字符串作为起始值和终止值的切片符号
Selecting a single row with .loc with a string
使用带有字符串的 .loc 选择单行
To select a single row of data, place the index label inside of the brackets following .loc
.
要选择单行数据,请将索引标签放在后面的括号内.loc
。
df.loc['Penelope']
This returns the row of data as a Series
这将数据行作为系列返回
age 4
color white
food Apple
height 80
score 3.3
state AL
Name: Penelope, dtype: object
Selecting multiple rows with .loc with a list of strings
使用带有字符串列表的 .loc 选择多行
df.loc[['Cornelia', 'Jane', 'Dean']]
This returns a DataFrame with the rows in the order specified in the list:
这将返回一个 DataFrame,其中的行按列表中指定的顺序排列:
Selecting multiple rows with .loc with slice notation
使用带有切片符号的 .loc 选择多行
Slice notation is defined by a start, stop and step values. When slicing by label, pandas includes the stop value in the return. The following slices from Aaron to Dean, inclusive. Its step size is not explicitly defined but defaulted to 1.
切片符号由开始、停止和步长值定义。按标签切片时,pandas 在返回值中包含停止值。以下从 Aaron 到 Dean 的切片,包括在内。它的步长没有明确定义,但默认为 1。
df.loc['Aaron':'Dean']
Complex slices can be taken in the same manner as Python lists.
可以采用与 Python 列表相同的方式获取复杂切片。
.iloc selects data only by integer location
.iloc 仅按整数位置选择数据
Let's now turn to .iloc
. Every row and column of data in a DataFrame has an integer location that defines it. This is in addition to the label that is visually displayed in the output. The integer location is simply the number of rows/columns from the top/left beginning at 0.
现在让我们转向.iloc
. DataFrame 中的每一行和每一列数据都有一个定义它的整数位置。这是对输出中直观显示的标签的补充。整数位置只是从顶部/左侧开始的行/列数,从 0 开始。
There are many different inputsyou can use for .iloc
three out of them are
有许多不同的输入,你可以用.iloc
四分之三都是
- An integer
- A list of integers
- Slice notation using integers as the start and stop values
- 一个整数
- 整数列表
- 使用整数作为起始值和终止值的切片符号
Selecting a single row with .iloc with an integer
使用带有整数的 .iloc 选择单行
df.iloc[4]
This returns the 5th row (integer location 4) as a Series
这将作为系列返回第 5 行(整数位置 4)
age 32
color gray
food Cheese
height 180
score 1.8
state AK
Name: Dean, dtype: object
Selecting multiple rows with .iloc with a list of integers
使用带有整数列表的 .iloc 选择多行
df.iloc[[2, -2]]
This returns a DataFrame of the third and second to last rows:
这将返回第三行和倒数第二行的 DataFrame:
Selecting multiple rows with .iloc with slice notation
使用带有切片符号的 .iloc 选择多行
df.iloc[:5:3]
Simultaneous selection of rows and columns with .loc and .iloc
使用 .loc 和 .iloc 同时选择行和列
One excellent ability of both .loc/.iloc
is their ability to select both rows and columns simultaneously. In the examples above, all the columns were returned from each selection. We can choose columns with the same types of inputs as we do for rows. We simply need to separate the row and column selection with a comma.
两者的一项出色能力是同时.loc/.iloc
选择行和列的能力。在上面的例子中,所有的列都是从每个选择中返回的。我们可以选择输入类型与行相同的列。我们只需要用逗号分隔行和列选择。
For example, we can select rows Jane, and Dean with just the columns height, score and state like this:
例如,我们可以选择行 Jane 和 Dean,其中只有列的高度、分数和状态,如下所示:
df.loc[['Jane', 'Dean'], 'height':]
This uses a list of labels for the rows and slice notation for the columns
这使用行的标签列表和列的切片符号
We can naturally do similar operations with .iloc
using only integers.
我们自然可以.iloc
只使用整数来进行类似的操作。
df.iloc[[1,4], 2]
Nick Lamb
Dean Cheese
Name: food, dtype: object
Simultaneous selection with labels and integer location
同时选择标签和整数位置
.ix
was used to make selections simultaneously with labels and integer location which was useful but confusing and ambiguous at times and thankfully it has been deprecated. In the event that you need to make a selection with a mix of labels and integer locations, you will have to make both your selections labels or integer locations.
.ix
用于与标签和整数位置同时进行选择,这很有用,但有时令人困惑和模棱两可,幸运的是它已被弃用。如果您需要使用标签和整数位置的混合进行选择,则必须同时进行选择标签或整数位置。
For instance, if we want to select rows Nick
and Cornelia
along with columns 2 and 4, we could use .loc
by converting the integers to labels with the following:
例如,如果我们想选择行Nick
以及第Cornelia
2 列和第 4 列,我们可以.loc
通过将整数转换为标签来使用以下内容:
col_names = df.columns[[2, 4]]
df.loc[['Nick', 'Cornelia'], col_names]
Or alternatively, convert the index labels to integers with the get_loc
index method.
或者,使用get_loc
index 方法将索引标签转换为整数。
labels = ['Nick', 'Cornelia']
index_ints = [df.index.get_loc(label) for label in labels]
df.iloc[index_ints, [2, 4]]
Boolean Selection
布尔选择
The .loc indexer can also do boolean selection. For instance, if we are interested in finding all the rows where age is above 30 and return just the food
and score
columns we can do the following:
.loc 索引器也可以进行布尔选择。例如,如果我们有兴趣查找年龄大于 30 的所有行并仅返回food
和score
列,我们可以执行以下操作:
df.loc[df['age'] > 30, ['food', 'score']]
You can replicate this with .iloc
but you cannot pass it a boolean series. You must convert the boolean Series into a numpy array like this:
您可以使用 with 复制它,.iloc
但不能将它传递给布尔系列。您必须将布尔系列转换为这样的 numpy 数组:
df.iloc[(df['age'] > 30).values, [2, 4]]
Selecting all rows
选择所有行
It is possible to use .loc/.iloc
for just column selection. You can select all the rows by using a colon like this:
可以.loc/.iloc
仅用于列选择。您可以使用这样的冒号来选择所有行:
df.loc[:, 'color':'score':2]
The indexing operator, []
, can slice can select rows and columns too but not simultaneously.
索引运算符 , []
can slice 也可以选择行和列,但不能同时选择。
Most people are familiar with the primary purpose of the DataFrame indexing operator, which is to select columns. A string selects a single column as a Series and a list of strings selects multiple columns as a DataFrame.
大多数人都熟悉 DataFrame 索引运算符的主要用途,即选择列。字符串选择单列作为系列,字符串列表选择多列作为数据帧。
df['food']
Jane Steak
Nick Lamb
Aaron Mango
Penelope Apple
Dean Cheese
Christina Melon
Cornelia Beans
Name: food, dtype: object
Using a list selects multiple columns
使用列表选择多列
df[['food', 'score']]
What people are less familiar with, is that, when slice notation is used, then selection happens by row labels or by integer location. This is very confusing and something that I almost never use but it does work.
人们不太熟悉的是,当使用切片符号时,选择是通过行标签或整数位置进行的。这非常令人困惑,而且我几乎从未使用过,但它确实有效。
df['Penelope':'Christina'] # slice rows by label
df[2:6:2] # slice rows by integer location
The explicitness of .loc/.iloc
for selecting rows is highly preferred. The indexing operator alone is unable to select rows and columns simultaneously.
.loc/.iloc
选择行的明确性是非常受欢迎的。单独的索引运算符无法同时选择行和列。
df[3:5, 'color']
TypeError: unhashable type: 'slice'
Selection by .at
and .iat
由.at
和选择.iat
Selection with .at
is nearly identical to .loc
but it only selects a single 'cell' in your DataFrame. We usually refer to this cell as a scalar value. To use .at
, pass it both a row and column label separated by a comma.
选择与.at
几乎相同,.loc
但它只选择您的 DataFrame 中的单个“单元格”。我们通常将此单元格称为标量值。要使用.at
,请同时传递以逗号分隔的行和列标签。
df.at['Christina', 'color']
'black'
Selection with .iat
is nearly identical to .iloc
but it only selects a single scalar value. You must pass it an integer for both the row and column locations
选择与.iat
几乎相同,.iloc
但它只选择一个标量值。您必须为行和列位置传递一个整数
df.iat[2, 5]
'FL'
回答by Fabio Pomi
Let's start with this small df:
让我们从这个小 df 开始:
import pandas as pd
import time as tm
import numpy as np
n=10
a=np.arange(0,n**2)
df=pd.DataFrame(a.reshape(n,n))
We'll so have
我们会有
df
Out[25]:
0 1 2 3 4 5 6 7 8 9
0 0 1 2 3 4 5 6 7 8 9
1 10 11 12 13 14 15 16 17 18 19
2 20 21 22 23 24 25 26 27 28 29
3 30 31 32 33 34 35 36 37 38 39
4 40 41 42 43 44 45 46 47 48 49
5 50 51 52 53 54 55 56 57 58 59
6 60 61 62 63 64 65 66 67 68 69
7 70 71 72 73 74 75 76 77 78 79
8 80 81 82 83 84 85 86 87 88 89
9 90 91 92 93 94 95 96 97 98 99
With this we have:
有了这个,我们有:
df.iloc[3,3]
Out[33]: 33
df.iat[3,3]
Out[34]: 33
df.iloc[:3,:3]
Out[35]:
0 1 2 3
0 0 1 2 3
1 10 11 12 13
2 20 21 22 23
3 30 31 32 33
df.iat[:3,:3]
Traceback (most recent call last):
... omissis ...
ValueError: At based indexing on an integer index can only have integer indexers
Thus we cannot use .iat for subset, where we must use .iloc only.
因此,我们不能将 .iat 用于子集,而必须仅使用 .iloc。
But let's try both to select from a larger df and let's check the speed ...
但是让我们尝试从更大的 df 中进行选择,然后检查速度......
# -*- coding: utf-8 -*-
"""
Created on Wed Feb 7 09:58:39 2018
@author: Fabio Pomi
"""
import pandas as pd
import time as tm
import numpy as np
n=1000
a=np.arange(0,n**2)
df=pd.DataFrame(a.reshape(n,n))
t1=tm.time()
for j in df.index:
for i in df.columns:
a=df.iloc[j,i]
t2=tm.time()
for j in df.index:
for i in df.columns:
a=df.iat[j,i]
t3=tm.time()
loc=t2-t1
at=t3-t2
prc = loc/at *100
print('\nloc:%f at:%f prc:%f' %(loc,at,prc))
loc:10.485600 at:7.395423 prc:141.784987
So with .loc we can manage subsets and with .at only a single scalar, but .at is faster than .loc
所以使用 .loc 我们可以管理子集,使用 .at 只能管理一个标量,但 .at 比 .loc 快
:-)
:-)