在 python 中使用 .loc 进行选择
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/44890713/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Selection with .loc in python
提问by bugsyb
I saw this code in someone's iPython notebook, and I'm very confused as to how this code works. As far as I understood, pd.loc[] is used as a location based indexer where the format is:
我在某人的 iPython notebook 中看到了这段代码,我很困惑这段代码是如何工作的。据我了解, pd.loc[] 用作基于位置的索引器,其格式为:
df.loc[index,column_name]
However, in this case, the first index seems to be a series of boolean values. Could someone please explain to me how this selection works. I tried to read through the documentation but I couldn't figure out an explanation. Thanks!
但是,在这种情况下,第一个索引似乎是一系列布尔值。有人可以向我解释这个选择是如何工作的。我试图通读文档,但找不到解释。谢谢!
iris_data.loc[iris_data['class'] == 'versicolor', 'class'] = 'Iris-versicolor'
回答by piRSquared
pd.DataFrame.loc
can take one or two indexers. For the rest of the post, I'll represent the first indexer as i
and the second indexer as j
.
pd.DataFrame.loc
可以带一两个索引器。对于本文的其余部分,我将第一个索引器表示为i
,第二个索引器表示为j
。
If only one indexer is provided, it applies to the index of the dataframe and the missing indexer is assumed to represent all columns. So the following two examples are equivalent.
如果仅提供一个索引器,则它适用于数据帧的索引,并且假定缺少的索引器代表所有列。所以下面两个例子是等价的。
df.loc[i]
df.loc[i, :]
df.loc[i]
df.loc[i, :]
Where :
is used to represent all columns.
where:
用于表示所有列。
If both indexers are present, i
references index values and j
references column values.
如果两个索引器都存在,则i
引用索引值并j
引用列值。
Now we can focus on what types of values i
and j
can assume. Let's use the following dataframe df
as our example:
现在我们可以专注于什么类型的值i
,j
可以假设。让我们使用以下数据框df
作为示例:
df = pd.DataFrame([[1, 2], [3, 4]], index=['A', 'B'], columns=['X', 'Y'])
loc
has been written such that i
and j
can be
loc
已经写成i
并且j
可以
scalarsthat should be values in the respective index objects
df.loc['A', 'Y'] 2
arrayswhose elements are also members of the respective index object (notice that the order of the array I pass to
loc
is respecteddf.loc[['B', 'A'], 'X'] B 3 A 1 Name: X, dtype: int64
Notice the dimensionality of the return object when passing arrays.
i
is an array as it was above,loc
returns an object in which an index with those values is returned. In this case, becausej
was a scalar,loc
returned apd.Series
object. We could've manipulated this to return a dataframe if we passed an array fori
andj
, and the array could've have just been a single value'd array.df.loc[['B', 'A'], ['X']] X B 3 A 1
boolean arrayswhose elements are
True
orFalse
and whose length matches the length of the respective index. In this case,loc
simply grabs the rows (or columns) in which the boolean array isTrue
.df.loc[[True, False], ['X']] X A 1
应该是相应索引对象中的值的标量
df.loc['A', 'Y'] 2
其元素也是相应索引对象成员的数组(注意我传递给的数组的顺序
loc
是被尊重的df.loc[['B', 'A'], 'X'] B 3 A 1 Name: X, dtype: int64
注意传递数组时返回对象的维度。
i
是上面的数组,loc
返回一个对象,其中返回具有这些值的索引。在这种情况下,因为j
是标量,所以loc
返回了一个pd.Series
对象。如果我们为i
and传递一个数组,我们可以操纵它返回一个数据帧j
,并且该数组可能只是一个单值数组。df.loc[['B', 'A'], ['X']] X B 3 A 1
元素为
True
orFalse
且长度与相应索引的长度匹配的布尔数组。在这种情况下,loc
只需获取布尔数组所在的行(或列)True
。df.loc[[True, False], ['X']] X A 1
In addition to what indexers you can pass to loc
, it also enables you to make assignments. Now we can break down the line of code you provided.
除了您可以传递给哪些索引器之外loc
,它还使您能够进行分配。现在我们可以分解您提供的代码行。
iris_data.loc[iris_data['class'] == 'versicolor', 'class'] = 'Iris-versicolor'
iris_data['class'] == 'versicolor'
returns a boolean array.class
is a scalar that represents a value in the columns object.iris_data.loc[iris_data['class'] == 'versicolor', 'class']
returns apd.Series
object consisting of the'class'
column for all rows where'class'
is'versicolor'
When used with an assignment operator:
iris_data.loc[iris_data['class'] == 'versicolor', 'class'] = 'Iris-versicolor'
We assign
'Iris-versicolor'
for all elements in column'class'
where'class'
was'versicolor'
iris_data['class'] == 'versicolor'
返回一个布尔数组。class
是一个标量,表示列对象中的值。iris_data.loc[iris_data['class'] == 'versicolor', 'class']
返回一个pd.Series
由'class'
所有行的列组成的对象,其中'class'
是'versicolor'
与赋值运算符一起使用时:
iris_data.loc[iris_data['class'] == 'versicolor', 'class'] = 'Iris-versicolor'
我们分配
'Iris-versicolor'
在列中的所有元素'class'
,其中'class'
为'versicolor'
回答by LangeHaare
This is using dataframes from the pandas
package. The "index" part can be either a single index, a list of indices, or a list of booleans. This can be read about in the documentation: https://pandas.pydata.org/pandas-docs/stable/indexing.html
这是使用pandas
包中的数据帧。“索引”部分可以是单个索引、索引列表或布尔值列表。这可以在文档中阅读:https: //pandas.pydata.org/pandas-docs/stable/indexing.html
So the index
part specifies a subset of the rows to pull out, and the (optional) column_name
specifies the column you want to work with from that subset of the dataframe. So if you want to update the 'class' column but only in rows where the class is currently set as 'versicolor', you might do something like what you list in the question:
因此,该index
部分指定要提取的行的子集,(可选)column_name
指定要从数据帧的该子集中使用的列。因此,如果您想更新“类”列但仅在类当前设置为“versicolor”的行中,您可能会执行类似问题中列出的操作:
iris_data.loc[iris_data['class'] == 'versicolor', 'class'] = 'Iris-versicolor'
回答by Aashish Kumar
It's a pandas data-frame and it's using label base selection tool with df.loc
and in it, there are two inputs, one for the row and the other one for the column, so in the row input it's selecting all those row values where the value saved in the column class
is versicolor
, and in the column input it's selecting the column with label class
, and assigning Iris-versicolor
value to them.
So basically it's replacing all the cells of column class
with value versicolor
with Iris-versicolor
.
这是一个熊猫数据框,它使用标签库选择工具df.loc
,其中有两个输入,一个用于行,另一个用于列,因此在行输入中,它选择保存值的所有行值在列中class
是versicolor
,在列输入中它选择带有标签的列class
,并Iris-versicolor
为它们分配值。所以基本上它替换列的所有单元格class
与价值versicolor
有Iris-versicolor
。
回答by Def_Os
It's pandas
label-based selection, as explained here: https://pandas.pydata.org/pandas-docs/stable/indexing.html#selection-by-label
这是pandas
基于标签的选择,如下所述:https: //pandas.pydata.org/pandas-docs/stable/indexing.html#selection-by-label
The boolean array is basically a selection method using a mask.
布尔数组基本上是一种使用掩码的选择方法。