Python iloc、ix 和 loc 有何不同？

Question

提问by AZhao

Can someone explain how these three methods of slicing are different?
I've seen the docs, and I've seen these answers, but I still find myself unable to explain how the three are different. To me, they seem interchangeable in large part, because they are at the lower levels of slicing.

有人能解释一下这三种切片方法有何不同吗？
我看过文档，也看过这些答案，但我仍然发现自己无法解释这三者有何不同。对我来说，它们在很大程度上似乎可以互换，因为它们处于较低的切片级别。

For example, say we want to get the first five rows of a DataFrame. How is it that all three of these work?

例如，假设我们想要获取 a 的前五行DataFrame。这三个是如何工作的？

df.loc[:5]
df.ix[:5]
df.iloc[:5]

Can someone present three cases where the distinction in uses are clearer?

有人可以提出三种用法区别更清楚的情况吗？

Answer 1

采纳答案by Alex Riley

Note: in pandas version 0.20.0 and above, ixis deprecatedand the use of locand ilocis encouraged instead. I have left the parts of this answer that describe ixintact as a reference for users of earlier versions of pandas. Examples have been added below showing alternatives to ix.

注意：在 Pandas 0.20.0 及以上版本中，ix已弃用，loc并iloc鼓励使用和代替。我保留了这个答案中ix完整描述的部分，作为早期版本熊猫用户的参考。下面添加了示例，显示了 ix.

First, here's a recap of the three methods:

首先，总结一下这三种方法：

locgets rows (or columns) with particular labelsfrom the index.
ilocgets rows (or columns) at particular positionsin the index (so it only takes integers).
ixusually tries to behave like locbut falls back to behaving like ilocif a label is not present in the index.

loc从索引中获取具有特定标签的行（或列）。
iloc获取索引中特定位置的行（或列）（因此它只需要整数）。
ix如果索引中不存在标签，通常会尝试表现得像loc但回退到表现得像iloc。

It's important to note some subtleties that can make ixslightly tricky to use:

重要的是要注意一些ix使用起来有点棘手的微妙之处：

if the index is of integer type, ixwill only use label-based indexing and not fall back to position-based indexing. If the label is not in the index, an error is raised.
if the index does not contain onlyintegers, then given an integer, ixwill immediately use position-based indexing rather than label-based indexing. If however ixis given another type (e.g. a string), it can use label-based indexing.

如果索引是整数类型，ix将只使用基于标签的索引而不是回退到基于位置的索引。如果标签不在索引中，则会引发错误。
如果指数不包含唯一整数，然后给出一个整数，ix将立即使用基于位置的索引，而不是基于标签的索引。然而，如果ix给出另一种类型（例如字符串），它可以使用基于标签的索引。

To illustrate the differences between the three methods, consider the following Series:

为了说明三种方法之间的差异，请考虑以下系列：

>>> s = pd.Series(np.nan, index=[49,48,47,46,45, 1, 2, 3, 4, 5])
>>> s
49   NaN
48   NaN
47   NaN
46   NaN
45   NaN
1    NaN
2    NaN
3    NaN
4    NaN
5    NaN

We'll look at slicing with the integer value 3.

我们将研究使用整数值进行切片3。

In this case, s.iloc[:3]returns us the first 3 rows (since it treats 3 as a position) and s.loc[:3]returns us the first 8 rows (since it treats 3 as a label):

在这种情况下，s.iloc[:3]返回前 3 行（因为它将 3 视为一个位置）并s.loc[:3]返回我们前 8 行（因为它将 3 视为一个标签）：

>>> s.iloc[:3] # slice the first three rows
49   NaN
48   NaN
47   NaN

>>> s.loc[:3] # slice up to and including label 3
49   NaN
48   NaN
47   NaN
46   NaN
45   NaN
1    NaN
2    NaN
3    NaN

>>> s.ix[:3] # the integer is in the index so s.ix[:3] works like loc
49   NaN
48   NaN
47   NaN
46   NaN
45   NaN
1    NaN
2    NaN
3    NaN

Notice s.ix[:3]returns the same Series as s.loc[:3]since it looks for the label first rather than working on the position (and the index for sis of integer type).

注意s.ix[:3]返回相同的系列，s.loc[:3]因为它首先查找标签而不是在位置上工作（并且索引s是整数类型）。

What if we try with an integer label that isn't in the index (say 6)?

如果我们尝试使用不在索引中的整数标签（例如6）怎么办？

Here s.iloc[:6]returns the first 6 rows of the Series as expected. However, s.loc[:6]raises a KeyError since 6is not in the index.

这里s.iloc[:6]按预期返回系列的前 6 行。但是，s.loc[:6]由于6不在索引中，因此会引发 KeyError 。

>>> s.iloc[:6]
49   NaN
48   NaN
47   NaN
46   NaN
45   NaN
1    NaN

>>> s.loc[:6]
KeyError: 6

>>> s.ix[:6]
KeyError: 6

As per the subtleties noted above, s.ix[:6]now raises a KeyError because it tries to work like locbut can't find a 6in the index. Because our index is of integer type ixdoesn't fall back to behaving like iloc.

根据上面提到的微妙之处，s.ix[:6]现在会引发 KeyError ，因为它尝试像索引一样工作loc但找不到 a 6。因为我们的索引是整数类型，ix所以不会回到像iloc.

If, however, our index was of mixed type, given an integer ixwould behave like ilocimmediately instead of raising a KeyError:

然而，如果我们的索引是混合类型的，给定一个整数ix会iloc立即表现，而不是引发 KeyError：

>>> s2 = pd.Series(np.nan, index=['a','b','c','d','e', 1, 2, 3, 4, 5])
>>> s2.index.is_mixed() # index is mix of different types
True
>>> s2.ix[:6] # now behaves like iloc given integer
a   NaN
b   NaN
c   NaN
d   NaN
e   NaN
1   NaN

Keep in mind that ixcan still accept non-integers and behave like loc:

请记住，ix仍然可以接受非整数并表现如下loc：

>>> s2.ix[:'c'] # behaves like loc given non-integer
a   NaN
b   NaN
c   NaN

As general advice, if you're only indexing using labels, or only indexing using integer positions, stick with locor ilocto avoid unexpected results - try not use ix.

作为一般建议，如果您仅使用标签进行索引，或仅使用整数位置进行索引，请坚持使用loc或iloc避免意外结果 - 尽量不要使用ix.

Combining position-based and label-based indexing

结合基于位置和基于标签的索引

Sometimes given a DataFrame, you will want to mix label and positional indexing methods for the rows and columns.

有时给定一个 DataFrame，你会想要混合行和列的标签和位置索引方法。

For example, consider the following DataFrame. How best to slice the rows up to and including 'c' andtake the first four columns?

例如，考虑以下 DataFrame。如何最好地将行切成并包括“c”并取前四列？

>>> df = pd.DataFrame(np.nan, 
                      index=list('abcde'),
                      columns=['x','y','z', 8, 9])
>>> df
    x   y   z   8   9
a NaN NaN NaN NaN NaN
b NaN NaN NaN NaN NaN
c NaN NaN NaN NaN NaN
d NaN NaN NaN NaN NaN
e NaN NaN NaN NaN NaN

In earlier versions of pandas (before 0.20.0) ixlets you do this quite neatly - we can slice the rows by label and the columns by position (note that for the columns, ixwill default to position-based slicing since 4is not a column name):

在早期版本的 Pandas（0.20.0 之前）中ix，您可以非常巧妙地做到这一点 - 我们可以按标签对行进行切片，按位置对列进行切片（请注意，对于列，ix将默认为基于位置的切片，因为 4它不是列名)：

>>> df.ix[:'c', :4]
    x   y   z   8
a NaN NaN NaN NaN
b NaN NaN NaN NaN
c NaN NaN NaN NaN

In later versions of pandas, we can achieve this result using ilocand the help of another method:

在 Pandas 的后续版本中，我们可以使用iloc另一种方法来实现这个结果：

>>> df.iloc[:df.index.get_loc('c') + 1, :4]
    x   y   z   8
a NaN NaN NaN NaN
b NaN NaN NaN NaN
c NaN NaN NaN NaN

get_loc()is an index method meaning "get the position of the label in this index". Note that since slicing with ilocis exclusive of its endpoint, we must add 1 to this value if we want row 'c' as well.

get_loc()是一个索引方法，意思是“获取标签在这个索引中的位置”。请注意，由于切片iloc不包括其端点，如果我们还想要行 'c'，我们必须向该值加 1。

There are further examples in pandas' documentation here.

pandas 的文档中有更多示例here。

Answer 2

回答by JoeCondron

ilocworks based on integer positioning. So no matter what your row labels are, you can always, e.g., get the first row by doing

iloc基于整数定位工作。所以不管你的行标签是什么，你总是可以，例如，通过做得到第一行

df.iloc[0]

or the last five rows by doing

或最后五行做

df.iloc[-5:]

You can also use it on the columns. This retrieves the 3rd column:

您也可以在列上使用它。这将检索第三列：

df.iloc[:, 2]    # the : in the first position indicates all rows

You can combine them to get intersections of rows and columns:

您可以组合它们以获得行和列的交集：

df.iloc[:3, :3] # The upper-left 3 X 3 entries (assuming df has 3+ rows and columns)

On the other hand, .locuse named indices. Let's set up a data frame with strings as row and column labels:

另一方面，.loc使用命名索引。让我们用字符串作为行和列标签设置一个数据框：

df = pd.DataFrame(index=['a', 'b', 'c'], columns=['time', 'date', 'name'])

Then we can get the first row by

然后我们可以得到第一行

df.loc['a']     # equivalent to df.iloc[0]

and the second two rows of the 'date'column by

和第二两排的'date'柱通过

df.loc['b':, 'date']   # equivalent to df.iloc[1:, 1]

and so on. Now, it's probably worth pointing out that the default row and column indices for a DataFrameare integers from 0 and in this case ilocand locwould work in the same way. This is why your three examples are equivalent. If you had a non-numeric index such as strings or datetimes,df.loc[:5]would raise an error.

等等。现在，可能值得指出的是， a 的默认行和列索引DataFrame是从 0 开始的整数，在这种情况下iloc，它们的loc工作方式相同。这就是为什么您的三个示例是等效的。如果您有一个非数字索引，例如字符串或日期时间，df.loc[:5]则会引发错误。

Also, you can do column retrieval just by using the data frame's __getitem__:

此外，您可以仅使用数据框的__getitem__以下内容进行列检索：

df['time']    # equivalent to df.loc[:, 'time']

Now suppose you want to mix position and named indexing, that is, indexing using names on rows and positions on columns (to clarify, I mean select from our data frame, rather than creating a data frame with strings in the row index and integers in the column index). This is where .ixcomes in:

现在假设您想要混合位置和命名索引，即使用行名称和列位置进行索引（为了澄清，我的意思是从我们的数据框中选择，而不是创建一个在行索引中包含字符串和整数的数据框列索引）。这是.ix进来的地方：

df.ix[:2, 'time']    # the first two rows of the 'time' column

I think it's also worth mentioning that you can pass boolean vectors to the locmethod as well. For example:

我认为还值得一提的是，您也可以将布尔向量传递给该loc方法。例如：

 b = [True, False, True]
 df.loc[b]

Will return the 1st and 3rd rows of df. This is equivalent to df[b]for selection, but it can also be used for assigning via boolean vectors:

将返回的第一行和第三行df。这等效于df[b]for selection，但它也可用于通过布尔向量进行分配：

df.loc[b, 'name'] = 'Mary', 'John'

Answer 3

回答by Ted Petrou

In my opinion, the accepted answer is confusing, since it uses a DataFrame with only missing values. I also do not like the term position-basedfor .ilocand instead, prefer integer locationas it is much more descriptive and exactly what .ilocstands for. The key word is INTEGER - .ilocneeds INTEGERS.

在我看来，接受的答案令人困惑，因为它使用只有缺失值的 DataFrame。我也不喜欢术语基于位置的.iloc，相反，喜欢整数位置，因为它是更描述性，正是.iloc代表。关键词是整数 -.iloc需要整数。

See my extremely detailed blog serieson subset selection for more

有关更多信息，请参阅我关于子集选择的极其详细的博客系列

.ix is deprecated and ambiguous and should never be used

.ix 已弃用且不明确，永远不应使用

Because .ixis deprecated we will only focus on the differences between .locand .iloc.

因为.ix已弃用，我们将只关注.loc和之间的差异.iloc。

Before we talk about the differences, it is important to understand that DataFrames have labels that help identify each column and each index. Let's take a look at a sample DataFrame:

在我们讨论差异之前，重要的是要了解 DataFrame 具有帮助识别每一列和每个索引的标签。让我们看一个示例 DataFrame：

df = pd.DataFrame({'age':[30, 2, 12, 4, 32, 33, 69],
                   'color':['blue', 'green', 'red', 'white', 'gray', 'black', 'red'],
                   'food':['Steak', 'Lamb', 'Mango', 'Apple', 'Cheese', 'Melon', 'Beans'],
                   'height':[165, 70, 120, 80, 180, 172, 150],
                   'score':[4.6, 8.3, 9.0, 3.3, 1.8, 9.5, 2.2],
                   'state':['NY', 'TX', 'FL', 'AL', 'AK', 'TX', 'TX']
                   },
                  index=['Jane', 'Nick', 'Aaron', 'Penelope', 'Dean', 'Christina', 'Cornelia'])

All the words in boldare the labels. The labels, age, color, food, height, scoreand stateare used for the columns. The other labels, Jane, Nick, Aaron, Penelope, Dean, Christina, Corneliaare used for the index.

所有粗体字都是标签。标签，age，color，food，height，score和state被用于列。其他标签，Jane，Nick，Aaron，Penelope，Dean，Christina，Cornelia被用于索引。

The primary ways to select particular rows in a DataFrame are with the .locand .ilocindexers. Each of these indexers can also be used to simultaneously select columns but it is easier to just focus on rows for now. Also, each of the indexers use a set of brackets that immediately follow their name to make their selections.

在 DataFrame 中选择特定行的主要方法是使用.loc和.iloc索引器。这些索引器中的每一个也可用于同时选择列，但现在更容易只关注行。此外，每个索引器都使用一组紧跟其名称的括号来进行选择。

.loc selects data only by labels

.loc 仅通过标签选择数据

We will first talk about the .locindexer which only selects data by the index or column labels. In our sample DataFrame, we have provided meaningful names as values for the index. Many DataFrames will not have any meaningful names and will instead, default to just the integers from 0 to n-1, where n is the length of the DataFrame.

我们将首先讨论.loc仅通过索引或列标签选择数据的索引器。在我们的示例 DataFrame 中，我们提供了有意义的名称作为索引的值。许多 DataFrame 没有任何有意义的名称，而是默认为从 0 到 n-1 的整数，其中 n 是 DataFrame 的长度。

There are three different inputs you can use for .loc

您可以使用三种不同的输入 .loc

A string
A list of strings
Slice notation using strings as the start and stop values

一个字符串
字符串列表
使用字符串作为起始值和终止值的切片符号

Selecting a single row with .loc with a string

使用带有字符串的 .loc 选择单行

To select a single row of data, place the index label inside of the brackets following .loc.

要选择单行数据，请将索引标签放在后面的括号内.loc。

df.loc['Penelope']

This returns the row of data as a Series

这将数据行作为系列返回

age           4
color     white
food      Apple
height       80
score       3.3
state        AL
Name: Penelope, dtype: object

Selecting multiple rows with .loc with a list of strings

使用带有字符串列表的 .loc 选择多行

df.loc[['Cornelia', 'Jane', 'Dean']]

This returns a DataFrame with the rows in the order specified in the list:

这将返回一个 DataFrame，其中的行按列表中指定的顺序排列：

Selecting multiple rows with .loc with slice notation

使用带有切片符号的 .loc 选择多行

Slice notation is defined by a start, stop and step values. When slicing by label, pandas includes the stop value in the return. The following slices from Aaron to Dean, inclusive. Its step size is not explicitly defined but defaulted to 1.

切片符号由开始、停止和步长值定义。按标签切片时，pandas 在返回值中包含停止值。以下从 Aaron 到 Dean 的切片，包括在内。它的步长没有明确定义，但默认为 1。

df.loc['Aaron':'Dean']

Complex slices can be taken in the same manner as Python lists.

可以采用与 Python 列表相同的方式获取复杂切片。

.iloc selects data only by integer location

.iloc 仅按整数位置选择数据

Let's now turn to .iloc. Every row and column of data in a DataFrame has an integer location that defines it. This is in addition to the label that is visually displayed in the output. The integer location is simply the number of rows/columns from the top/left beginning at 0.

现在让我们转向.iloc. DataFrame 中的每一行和每一列数据都有一个定义它的整数位置。这是对输出中直观显示的标签的补充。整数位置只是从顶部/左侧开始的行/列数，从 0 开始。

There are three different inputs you can use for .iloc

您可以使用三种不同的输入 .iloc

An integer
A list of integers
Slice notation using integers as the start and stop values

一个整数
整数列表
使用整数作为起始值和终止值的切片符号

Selecting a single row with .iloc with an integer

使用带有整数的 .iloc 选择单行

df.iloc[4]

This returns the 5th row (integer location 4) as a Series

这将作为系列返回第 5 行（整数位置 4）

age           32
color       gray
food      Cheese
height       180
score        1.8
state         AK
Name: Dean, dtype: object

Selecting multiple rows with .iloc with a list of integers

使用带有整数列表的 .iloc 选择多行

df.iloc[[2, -2]]

This returns a DataFrame of the third and second to last rows:

这将返回第三行和倒数第二行的 DataFrame：

Selecting multiple rows with .iloc with slice notation

使用带有切片符号的 .iloc 选择多行

df.iloc[:5:3]

Simultaneous selection of rows and columns with .loc and .iloc

使用 .loc 和 .iloc 同时选择行和列

One excellent ability of both .loc/.ilocis their ability to select both rows and columns simultaneously. In the examples above, all the columns were returned from each selection. We can choose columns with the same types of inputs as we do for rows. We simply need to separate the row and column selection with a comma.

两者的一项出色能力是同时.loc/.iloc选择行和列的能力。在上面的例子中，所有的列都是从每个选择中返回的。我们可以选择输入类型与行相同的列。我们只需要用逗号分隔行和列选择。

For example, we can select rows Jane, and Dean with just the columns height, score and state like this:

例如，我们可以选择行 Jane 和 Dean，其中只有列的高度、分数和状态，如下所示：

df.loc[['Jane', 'Dean'], 'height':]

This uses a list of labels for the rows and slice notation for the columns

这使用行的标签列表和列的切片符号

We can naturally do similar operations with .ilocusing only integers.

我们自然可以.iloc只使用整数来进行类似的操作。

df.iloc[[1,4], 2]
Nick      Lamb
Dean    Cheese
Name: food, dtype: object

Simultaneous selection with labels and integer location

同时选择标签和整数位置

.ixwas used to make selections simultaneously with labels and integer location which was useful but confusing and ambiguous at times and thankfully it has been deprecated. In the event that you need to make a selection with a mix of labels and integer locations, you will have to make both your selections labels or integer locations.

.ix用于与标签和整数位置同时进行选择，这很有用，但有时令人困惑和模棱两可，幸运的是它已被弃用。如果您需要使用标签和整数位置的混合进行选择，则必须同时进行选择标签或整数位置。

For instance, if we want to select rows Nickand Corneliaalong with columns 2 and 4, we could use .locby converting the integers to labels with the following:

例如，如果我们想选择行Nick以及第Cornelia2 列和第 4 列，我们可以.loc通过将整数转换为标签来使用以下内容：

col_names = df.columns[[2, 4]]
df.loc[['Nick', 'Cornelia'], col_names]

Or alternatively, convert the index labels to integers with the get_locindex method.

或者，使用get_locindex 方法将索引标签转换为整数。

labels = ['Nick', 'Cornelia']
index_ints = [df.index.get_loc(label) for label in labels]
df.iloc[index_ints, [2, 4]]

Boolean Selection

布尔选择

The .loc indexer can also do boolean selection. For instance, if we are interested in finding all the rows wher age is above 30 and return just the foodand scorecolumns we can do the following:

.loc 索引器也可以进行布尔选择。例如，如果我们有兴趣查找年龄大于 30 的所有行并仅返回food和score列，我们可以执行以下操作：

df.loc[df['age'] > 30, ['food', 'score']]

You can replicate this with .ilocbut you cannot pass it a boolean series. You must convert the boolean Series into a numpy array like this:

您可以使用 with 复制它，.iloc但不能将它传递给布尔系列。您必须将布尔系列转换为这样的 numpy 数组：

df.iloc[(df['age'] > 30).values, [2, 4]]

Selecting all rows

选择所有行

It is possible to use .loc/.ilocfor just column selection. You can select all the rows by using a colon like this:

可以.loc/.iloc仅用于列选择。您可以使用这样的冒号来选择所有行：

df.loc[:, 'color':'score':2]

The indexing operator, `[]`, can select rows and columns too but not simultaneously.

索引运算符`[]`也可以选择行和列，但不能同时选择。

Most people are familiar with the primary purpose of the DataFrame indexing operator, which is to select columns. A string selects a single column as a Series and a list of strings selects multiple columns as a DataFrame.

大多数人都熟悉 DataFrame 索引运算符的主要用途，即选择列。字符串选择单列作为系列，字符串列表选择多列作为数据帧。

df['food']

Jane          Steak
Nick           Lamb
Aaron         Mango
Penelope      Apple
Dean         Cheese
Christina     Melon
Cornelia      Beans
Name: food, dtype: object

Using a list selects multiple columns

使用列表选择多列

df[['food', 'score']]

What people are less familiar with, is that, when slice notation is used, then selection happens by row labels or by integer location. This is very confusing and something that I almost never use but it does work.

人们不太熟悉的是，当使用切片符号时，选择是通过行标签或整数位置进行的。这非常令人困惑，而且我几乎从未使用过，但它确实有效。

df['Penelope':'Christina'] # slice rows by label

df[2:6:2] # slice rows by integer location

The explicitness of .loc/.ilocfor selecting rows is highly preferred. The indexing operator alone is unable to select rows and columns simultaneously.

.loc/.iloc选择行的明确性是非常受欢迎的。单独的索引运算符无法同时选择行和列。

df[3:5, 'color']
TypeError: unhashable type: 'slice'

Python iloc、ix 和 loc 有何不同？

提问by AZhao

采纳答案by Alex Riley

Combining position-based and label-based indexing

结合基于位置和基于标签的索引

回答by JoeCondron

回答by Ted Petrou

.ix is deprecated and ambiguous and should never be used

.ix 已弃用且不明确，永远不应使用

.loc selects data only by labels

.loc 仅通过标签选择数据

.iloc selects data only by integer location

.iloc 仅按整数位置选择数据

Simultaneous selection of rows and columns with .loc and .iloc

使用 .loc 和 .iloc 同时选择行和列

Simultaneous selection with labels and integer location

同时选择标签和整数位置

Boolean Selection

布尔选择

Selecting all rows

选择所有行

The indexing operator, `[]`, can select rows and columns too but not simultaneously.

索引运算符`[]`也可以选择行和列，但不能同时选择。

相关推荐

最近更新

标签

Python iloc、ix 和 loc 有何不同？

提问by AZhao

采纳答案by Alex Riley

Combining position-based and label-based indexing

结合基于位置和基于标签的索引

回答by JoeCondron

回答by Ted Petrou

.ix is deprecated and ambiguous and should never be used

.ix 已弃用且不明确，永远不应使用

.loc selects data only by labels

.loc 仅通过标签选择数据

.iloc selects data only by integer location

.iloc 仅按整数位置选择数据

Simultaneous selection of rows and columns with .loc and .iloc

使用 .loc 和 .iloc 同时选择行和列

Simultaneous selection with labels and integer location

同时选择标签和整数位置

Boolean Selection

布尔选择

Selecting all rows

选择所有行

The indexing operator, [], can select rows and columns too but not simultaneously.

索引运算符[]也可以选择行和列，但不能同时选择。

相关推荐

Python sqlalchemy.exc.OperationalError: (OperationalError) 无法打开数据库文件 无 无

如何在 Python 请求中使用 cookie

Python Flask 视图返回错误“视图函数没有返回响应”

Python 如何在图形中间绘制轴？

相关推荐

最近更新

标签

The indexing operator, `[]`, can select rows and columns too but not simultaneously.

索引运算符`[]`也可以选择行和列，但不能同时选择。

Python sqlalchemy.exc.OperationalError: (OperationalError) 无法打开数据库文件无无