Python iloc、ix 和 loc 有何不同?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/31593201/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How are iloc, ix and loc different?
提问by AZhao
Can someone explain how these three methods of slicing are different?
I've seen the docs,
and I've seen theseanswers, but I still find myself unable to explain how the three are different. To me, they seem interchangeable in large part, because they are at the lower levels of slicing.
有人能解释一下这三种切片方法有何不同吗?
我看过文档,也看过这些答案,但我仍然发现自己无法解释这三者有何不同。对我来说,它们在很大程度上似乎可以互换,因为它们处于较低的切片级别。
For example, say we want to get the first five rows of a DataFrame
. How is it that all three of these work?
例如,假设我们想要获取 a 的前五行DataFrame
。这三个是如何工作的?
df.loc[:5]
df.ix[:5]
df.iloc[:5]
Can someone present three cases where the distinction in uses are clearer?
有人可以提出三种用法区别更清楚的情况吗?
采纳答案by Alex Riley
Note: in pandas version 0.20.0 and above, ix
is deprecatedand the use of loc
and iloc
is encouraged instead. I have left the parts of this answer that describe ix
intact as a reference for users of earlier versions of pandas. Examples have been added below showing alternatives to ix
.
注意:在 Pandas 0.20.0 及以上版本中,ix
已弃用,loc
并iloc
鼓励使用和代替。我保留了这个答案中ix
完整描述的部分,作为早期版本熊猫用户的参考。下面添加了示例,显示了 ix
.
First, here's a recap of the three methods:
首先,总结一下这三种方法:
loc
gets rows (or columns) with particular labelsfrom the index.iloc
gets rows (or columns) at particular positionsin the index (so it only takes integers).ix
usually tries to behave likeloc
but falls back to behaving likeiloc
if a label is not present in the index.
loc
从索引中获取具有特定标签的行(或列)。iloc
获取索引中特定位置的行(或列)(因此它只需要整数)。ix
如果索引中不存在标签,通常会尝试表现得像loc
但回退到表现得像iloc
。
It's important to note some subtleties that can make ix
slightly tricky to use:
重要的是要注意一些ix
使用起来有点棘手的微妙之处:
if the index is of integer type,
ix
will only use label-based indexing and not fall back to position-based indexing. If the label is not in the index, an error is raised.if the index does not contain onlyintegers, then given an integer,
ix
will immediately use position-based indexing rather than label-based indexing. If howeverix
is given another type (e.g. a string), it can use label-based indexing.
如果索引是整数类型,
ix
将只使用基于标签的索引而不是回退到基于位置的索引。如果标签不在索引中,则会引发错误。如果指数不包含唯一整数,然后给出一个整数,
ix
将立即使用基于位置的索引,而不是基于标签的索引。然而,如果ix
给出另一种类型(例如字符串),它可以使用基于标签的索引。
To illustrate the differences between the three methods, consider the following Series:
为了说明三种方法之间的差异,请考虑以下系列:
>>> s = pd.Series(np.nan, index=[49,48,47,46,45, 1, 2, 3, 4, 5])
>>> s
49 NaN
48 NaN
47 NaN
46 NaN
45 NaN
1 NaN
2 NaN
3 NaN
4 NaN
5 NaN
We'll look at slicing with the integer value 3
.
我们将研究使用整数值进行切片3
。
In this case, s.iloc[:3]
returns us the first 3 rows (since it treats 3 as a position) and s.loc[:3]
returns us the first 8 rows (since it treats 3 as a label):
在这种情况下,s.iloc[:3]
返回前 3 行(因为它将 3 视为一个位置)并s.loc[:3]
返回我们前 8 行(因为它将 3 视为一个标签):
>>> s.iloc[:3] # slice the first three rows
49 NaN
48 NaN
47 NaN
>>> s.loc[:3] # slice up to and including label 3
49 NaN
48 NaN
47 NaN
46 NaN
45 NaN
1 NaN
2 NaN
3 NaN
>>> s.ix[:3] # the integer is in the index so s.ix[:3] works like loc
49 NaN
48 NaN
47 NaN
46 NaN
45 NaN
1 NaN
2 NaN
3 NaN
Notice s.ix[:3]
returns the same Series as s.loc[:3]
since it looks for the label first rather than working on the position (and the index for s
is of integer type).
注意s.ix[:3]
返回相同的系列,s.loc[:3]
因为它首先查找标签而不是在位置上工作(并且索引s
是整数类型)。
What if we try with an integer label that isn't in the index (say 6
)?
如果我们尝试使用不在索引中的整数标签(例如6
)怎么办?
Here s.iloc[:6]
returns the first 6 rows of the Series as expected. However, s.loc[:6]
raises a KeyError since 6
is not in the index.
这里s.iloc[:6]
按预期返回系列的前 6 行。但是,s.loc[:6]
由于6
不在索引中,因此会引发 KeyError 。
>>> s.iloc[:6]
49 NaN
48 NaN
47 NaN
46 NaN
45 NaN
1 NaN
>>> s.loc[:6]
KeyError: 6
>>> s.ix[:6]
KeyError: 6
As per the subtleties noted above, s.ix[:6]
now raises a KeyError because it tries to work like loc
but can't find a 6
in the index. Because our index is of integer type ix
doesn't fall back to behaving like iloc
.
根据上面提到的微妙之处,s.ix[:6]
现在会引发 KeyError ,因为它尝试像索引一样工作loc
但找不到 a 6
。因为我们的索引是整数类型,ix
所以不会回到像iloc
.
If, however, our index was of mixed type, given an integer ix
would behave like iloc
immediately instead of raising a KeyError:
然而,如果我们的索引是混合类型的,给定一个整数ix
会iloc
立即表现,而不是引发 KeyError:
>>> s2 = pd.Series(np.nan, index=['a','b','c','d','e', 1, 2, 3, 4, 5])
>>> s2.index.is_mixed() # index is mix of different types
True
>>> s2.ix[:6] # now behaves like iloc given integer
a NaN
b NaN
c NaN
d NaN
e NaN
1 NaN
Keep in mind that ix
can still accept non-integers and behave like loc
:
请记住,ix
仍然可以接受非整数并表现如下loc
:
>>> s2.ix[:'c'] # behaves like loc given non-integer
a NaN
b NaN
c NaN
As general advice, if you're only indexing using labels, or only indexing using integer positions, stick with loc
or iloc
to avoid unexpected results - try not use ix
.
作为一般建议,如果您仅使用标签进行索引,或仅使用整数位置进行索引,请坚持使用loc
或iloc
避免意外结果 - 尽量不要使用ix
.
Combining position-based and label-based indexing
结合基于位置和基于标签的索引
Sometimes given a DataFrame, you will want to mix label and positional indexing methods for the rows and columns.
有时给定一个 DataFrame,你会想要混合行和列的标签和位置索引方法。
For example, consider the following DataFrame. How best to slice the rows up to and including 'c' andtake the first four columns?
例如,考虑以下 DataFrame。如何最好地将行切成并包括“c”并取前四列?
>>> df = pd.DataFrame(np.nan,
index=list('abcde'),
columns=['x','y','z', 8, 9])
>>> df
x y z 8 9
a NaN NaN NaN NaN NaN
b NaN NaN NaN NaN NaN
c NaN NaN NaN NaN NaN
d NaN NaN NaN NaN NaN
e NaN NaN NaN NaN NaN
In earlier versions of pandas (before 0.20.0) ix
lets you do this quite neatly - we can slice the rows by label and the columns by position (note that for the columns, ix
will default to position-based slicing since 4
is not a column name):
在早期版本的 Pandas(0.20.0 之前)中ix
,您可以非常巧妙地做到这一点 - 我们可以按标签对行进行切片,按位置对列进行切片(请注意,对于列,ix
将默认为基于位置的切片,因为 4
它不是列名):
>>> df.ix[:'c', :4]
x y z 8
a NaN NaN NaN NaN
b NaN NaN NaN NaN
c NaN NaN NaN NaN
In later versions of pandas, we can achieve this result using iloc
and the help of another method:
在 Pandas 的后续版本中,我们可以使用iloc
另一种方法来实现这个结果:
>>> df.iloc[:df.index.get_loc('c') + 1, :4]
x y z 8
a NaN NaN NaN NaN
b NaN NaN NaN NaN
c NaN NaN NaN NaN
get_loc()
is an index method meaning "get the position of the label in this index". Note that since slicing with iloc
is exclusive of its endpoint, we must add 1 to this value if we want row 'c' as well.
get_loc()
是一个索引方法,意思是“获取标签在这个索引中的位置”。请注意,由于切片iloc
不包括其端点,如果我们还想要行 'c',我们必须向该值加 1。
There are further examples in pandas' documentation here.
pandas 的文档中有更多示例here。
回答by JoeCondron
iloc
works based on integer positioning. So no matter what your row labels are, you can always, e.g., get the first row by doing
iloc
基于整数定位工作。所以不管你的行标签是什么,你总是可以,例如,通过做得到第一行
df.iloc[0]
or the last five rows by doing
或最后五行做
df.iloc[-5:]
You can also use it on the columns. This retrieves the 3rd column:
您也可以在列上使用它。这将检索第三列:
df.iloc[:, 2] # the : in the first position indicates all rows
You can combine them to get intersections of rows and columns:
您可以组合它们以获得行和列的交集:
df.iloc[:3, :3] # The upper-left 3 X 3 entries (assuming df has 3+ rows and columns)
On the other hand, .loc
use named indices. Let's set up a data frame with strings as row and column labels:
另一方面,.loc
使用命名索引。让我们用字符串作为行和列标签设置一个数据框:
df = pd.DataFrame(index=['a', 'b', 'c'], columns=['time', 'date', 'name'])
Then we can get the first row by
然后我们可以得到第一行
df.loc['a'] # equivalent to df.iloc[0]
and the second two rows of the 'date'
column by
和第二两排的'date'
柱通过
df.loc['b':, 'date'] # equivalent to df.iloc[1:, 1]
and so on. Now, it's probably worth pointing out that the default row and column indices for a DataFrame
are integers from 0 and in this case iloc
and loc
would work in the same way. This is why your three examples are equivalent. If you had a non-numeric index such as strings or datetimes,df.loc[:5]
would raise an error.
等等。现在,可能值得指出的是, a 的默认行和列索引DataFrame
是从 0 开始的整数,在这种情况下iloc
,它们的loc
工作方式相同。这就是为什么您的三个示例是等效的。如果您有一个非数字索引,例如字符串或日期时间,df.loc[:5]
则会引发错误。
Also, you can do column retrieval just by using the data frame's __getitem__
:
此外,您可以仅使用数据框的__getitem__
以下内容进行列检索:
df['time'] # equivalent to df.loc[:, 'time']
Now suppose you want to mix position and named indexing, that is, indexing using names on rows and positions on columns (to clarify, I mean select from our data frame, rather than creating a data frame with strings in the row index and integers in the column index). This is where .ix
comes in:
现在假设您想要混合位置和命名索引,即使用行名称和列位置进行索引(为了澄清,我的意思是从我们的数据框中选择,而不是创建一个在行索引中包含字符串和整数的数据框列索引)。这是.ix
进来的地方:
df.ix[:2, 'time'] # the first two rows of the 'time' column
I think it's also worth mentioning that you can pass boolean vectors to the loc
method as well. For example:
我认为还值得一提的是,您也可以将布尔向量传递给该loc
方法。例如:
b = [True, False, True]
df.loc[b]
Will return the 1st and 3rd rows of df
. This is equivalent to df[b]
for selection, but it can also be used for assigning via boolean vectors:
将返回 的第一行和第三行df
。这等效于df[b]
for selection,但它也可用于通过布尔向量进行分配:
df.loc[b, 'name'] = 'Mary', 'John'
回答by Ted Petrou
In my opinion, the accepted answer is confusing, since it uses a DataFrame with only missing values. I also do not like the term position-basedfor .iloc
and instead, prefer integer locationas it is much more descriptive and exactly what .iloc
stands for. The key word is INTEGER - .iloc
needs INTEGERS.
在我看来,接受的答案令人困惑,因为它使用只有缺失值的 DataFrame。我也不喜欢术语基于位置的.iloc
,相反,喜欢整数位置,因为它是更描述性,正是.iloc
代表。关键词是整数 -.iloc
需要整数。
See my extremely detailed blog serieson subset selection for more
有关更多信息,请参阅我关于子集选择的极其详细的博客系列
.ix is deprecated and ambiguous and should never be used
.ix 已弃用且不明确,永远不应使用
Because .ix
is deprecated we will only focus on the differences between .loc
and .iloc
.
因为.ix
已弃用,我们将只关注.loc
和之间的差异.iloc
。
Before we talk about the differences, it is important to understand that DataFrames have labels that help identify each column and each index. Let's take a look at a sample DataFrame:
在我们讨论差异之前,重要的是要了解 DataFrame 具有帮助识别每一列和每个索引的标签。让我们看一个示例 DataFrame:
df = pd.DataFrame({'age':[30, 2, 12, 4, 32, 33, 69],
'color':['blue', 'green', 'red', 'white', 'gray', 'black', 'red'],
'food':['Steak', 'Lamb', 'Mango', 'Apple', 'Cheese', 'Melon', 'Beans'],
'height':[165, 70, 120, 80, 180, 172, 150],
'score':[4.6, 8.3, 9.0, 3.3, 1.8, 9.5, 2.2],
'state':['NY', 'TX', 'FL', 'AL', 'AK', 'TX', 'TX']
},
index=['Jane', 'Nick', 'Aaron', 'Penelope', 'Dean', 'Christina', 'Cornelia'])
All the words in boldare the labels. The labels, age
, color
, food
, height
, score
and state
are used for the columns. The other labels, Jane
, Nick
, Aaron
, Penelope
, Dean
, Christina
, Cornelia
are used for the index.
所有粗体字都是标签。标签,age
,color
,food
,height
,score
和state
被用于列。其他标签,Jane
,Nick
,Aaron
,Penelope
,Dean
,Christina
,Cornelia
被用于索引。
The primary ways to select particular rows in a DataFrame are with the .loc
and .iloc
indexers. Each of these indexers can also be used to simultaneously select columns but it is easier to just focus on rows for now. Also, each of the indexers use a set of brackets that immediately follow their name to make their selections.
在 DataFrame 中选择特定行的主要方法是使用.loc
和.iloc
索引器。这些索引器中的每一个也可用于同时选择列,但现在更容易只关注行。此外,每个索引器都使用一组紧跟其名称的括号来进行选择。
.loc selects data only by labels
.loc 仅通过标签选择数据
We will first talk about the .loc
indexer which only selects data by the index or column labels. In our sample DataFrame, we have provided meaningful names as values for the index. Many DataFrames will not have any meaningful names and will instead, default to just the integers from 0 to n-1, where n is the length of the DataFrame.
我们将首先讨论.loc
仅通过索引或列标签选择数据的索引器。在我们的示例 DataFrame 中,我们提供了有意义的名称作为索引的值。许多 DataFrame 没有任何有意义的名称,而是默认为从 0 到 n-1 的整数,其中 n 是 DataFrame 的长度。
There are three different inputs you can use for .loc
您可以使用三种不同的输入 .loc
- A string
- A list of strings
- Slice notation using strings as the start and stop values
- 一个字符串
- 字符串列表
- 使用字符串作为起始值和终止值的切片符号
Selecting a single row with .loc with a string
使用带有字符串的 .loc 选择单行
To select a single row of data, place the index label inside of the brackets following .loc
.
要选择单行数据,请将索引标签放在后面的括号内.loc
。
df.loc['Penelope']
This returns the row of data as a Series
这将数据行作为系列返回
age 4
color white
food Apple
height 80
score 3.3
state AL
Name: Penelope, dtype: object
Selecting multiple rows with .loc with a list of strings
使用带有字符串列表的 .loc 选择多行
df.loc[['Cornelia', 'Jane', 'Dean']]
This returns a DataFrame with the rows in the order specified in the list:
这将返回一个 DataFrame,其中的行按列表中指定的顺序排列:
Selecting multiple rows with .loc with slice notation
使用带有切片符号的 .loc 选择多行
Slice notation is defined by a start, stop and step values. When slicing by label, pandas includes the stop value in the return. The following slices from Aaron to Dean, inclusive. Its step size is not explicitly defined but defaulted to 1.
切片符号由开始、停止和步长值定义。按标签切片时,pandas 在返回值中包含停止值。以下从 Aaron 到 Dean 的切片,包括在内。它的步长没有明确定义,但默认为 1。
df.loc['Aaron':'Dean']
Complex slices can be taken in the same manner as Python lists.
可以采用与 Python 列表相同的方式获取复杂切片。
.iloc selects data only by integer location
.iloc 仅按整数位置选择数据
Let's now turn to .iloc
. Every row and column of data in a DataFrame has an integer location that defines it. This is in addition to the label that is visually displayed in the output. The integer location is simply the number of rows/columns from the top/left beginning at 0.
现在让我们转向.iloc
. DataFrame 中的每一行和每一列数据都有一个定义它的整数位置。这是对输出中直观显示的标签的补充。整数位置只是从顶部/左侧开始的行/列数,从 0 开始。
There are three different inputs you can use for .iloc
您可以使用三种不同的输入 .iloc
- An integer
- A list of integers
- Slice notation using integers as the start and stop values
- 一个整数
- 整数列表
- 使用整数作为起始值和终止值的切片符号
Selecting a single row with .iloc with an integer
使用带有整数的 .iloc 选择单行
df.iloc[4]
This returns the 5th row (integer location 4) as a Series
这将作为系列返回第 5 行(整数位置 4)
age 32
color gray
food Cheese
height 180
score 1.8
state AK
Name: Dean, dtype: object
Selecting multiple rows with .iloc with a list of integers
使用带有整数列表的 .iloc 选择多行
df.iloc[[2, -2]]
This returns a DataFrame of the third and second to last rows:
这将返回第三行和倒数第二行的 DataFrame:
Selecting multiple rows with .iloc with slice notation
使用带有切片符号的 .iloc 选择多行
df.iloc[:5:3]
Simultaneous selection of rows and columns with .loc and .iloc
使用 .loc 和 .iloc 同时选择行和列
One excellent ability of both .loc/.iloc
is their ability to select both rows and columns simultaneously. In the examples above, all the columns were returned from each selection. We can choose columns with the same types of inputs as we do for rows. We simply need to separate the row and column selection with a comma.
两者的一项出色能力是同时.loc/.iloc
选择行和列的能力。在上面的例子中,所有的列都是从每个选择中返回的。我们可以选择输入类型与行相同的列。我们只需要用逗号分隔行和列选择。
For example, we can select rows Jane, and Dean with just the columns height, score and state like this:
例如,我们可以选择行 Jane 和 Dean,其中只有列的高度、分数和状态,如下所示:
df.loc[['Jane', 'Dean'], 'height':]
This uses a list of labels for the rows and slice notation for the columns
这使用行的标签列表和列的切片符号
We can naturally do similar operations with .iloc
using only integers.
我们自然可以.iloc
只使用整数来进行类似的操作。
df.iloc[[1,4], 2]
Nick Lamb
Dean Cheese
Name: food, dtype: object
Simultaneous selection with labels and integer location
同时选择标签和整数位置
.ix
was used to make selections simultaneously with labels and integer location which was useful but confusing and ambiguous at times and thankfully it has been deprecated. In the event that you need to make a selection with a mix of labels and integer locations, you will have to make both your selections labels or integer locations.
.ix
用于与标签和整数位置同时进行选择,这很有用,但有时令人困惑和模棱两可,幸运的是它已被弃用。如果您需要使用标签和整数位置的混合进行选择,则必须同时进行选择标签或整数位置。
For instance, if we want to select rows Nick
and Cornelia
along with columns 2 and 4, we could use .loc
by converting the integers to labels with the following:
例如,如果我们想选择行Nick
以及第Cornelia
2 列和第 4 列,我们可以.loc
通过将整数转换为标签来使用以下内容:
col_names = df.columns[[2, 4]]
df.loc[['Nick', 'Cornelia'], col_names]
Or alternatively, convert the index labels to integers with the get_loc
index method.
或者,使用get_loc
index 方法将索引标签转换为整数。
labels = ['Nick', 'Cornelia']
index_ints = [df.index.get_loc(label) for label in labels]
df.iloc[index_ints, [2, 4]]
Boolean Selection
布尔选择
The .loc indexer can also do boolean selection. For instance, if we are interested in finding all the rows wher age is above 30 and return just the food
and score
columns we can do the following:
.loc 索引器也可以进行布尔选择。例如,如果我们有兴趣查找年龄大于 30 的所有行并仅返回food
和score
列,我们可以执行以下操作:
df.loc[df['age'] > 30, ['food', 'score']]
You can replicate this with .iloc
but you cannot pass it a boolean series. You must convert the boolean Series into a numpy array like this:
您可以使用 with 复制它,.iloc
但不能将它传递给布尔系列。您必须将布尔系列转换为这样的 numpy 数组:
df.iloc[(df['age'] > 30).values, [2, 4]]
Selecting all rows
选择所有行
It is possible to use .loc/.iloc
for just column selection. You can select all the rows by using a colon like this:
可以.loc/.iloc
仅用于列选择。您可以使用这样的冒号来选择所有行:
df.loc[:, 'color':'score':2]
The indexing operator, []
, can select rows and columns too but not simultaneously.
索引运算符[]
也可以选择行和列,但不能同时选择。
Most people are familiar with the primary purpose of the DataFrame indexing operator, which is to select columns. A string selects a single column as a Series and a list of strings selects multiple columns as a DataFrame.
大多数人都熟悉 DataFrame 索引运算符的主要用途,即选择列。字符串选择单列作为系列,字符串列表选择多列作为数据帧。
df['food']
Jane Steak
Nick Lamb
Aaron Mango
Penelope Apple
Dean Cheese
Christina Melon
Cornelia Beans
Name: food, dtype: object
Using a list selects multiple columns
使用列表选择多列
df[['food', 'score']]
What people are less familiar with, is that, when slice notation is used, then selection happens by row labels or by integer location. This is very confusing and something that I almost never use but it does work.
人们不太熟悉的是,当使用切片符号时,选择是通过行标签或整数位置进行的。这非常令人困惑,而且我几乎从未使用过,但它确实有效。
df['Penelope':'Christina'] # slice rows by label
df[2:6:2] # slice rows by integer location
The explicitness of .loc/.iloc
for selecting rows is highly preferred. The indexing operator alone is unable to select rows and columns simultaneously.
.loc/.iloc
选择行的明确性是非常受欢迎的。单独的索引运算符无法同时选择行和列。
df[3:5, 'color']
TypeError: unhashable type: 'slice'