Python：Pandas 系列 - 为什么使用 loc？

Question

提问by Runner Bean

Why do we use 'loc' for pandas dataframes? it seems the following code with or without using loc both compile anr run at a simulular speed

为什么我们对 Pandas 数据框使用“loc”？似乎使用或不使用 loc 的以下代码都以模拟速度运行

%timeit df_user1 = df.loc[df.user_id=='5561']

100 loops, best of 3: 11.9 ms per loop

or

或者

%timeit df_user1_noloc = df[df.user_id=='5561']

100 loops, best of 3: 12 ms per loop

So why use loc?

那么为什么要使用loc呢？

Edit:This has been flagged as a duplicate question. But although pandas iloc vs ix vs loc explanation?does mention that *

编辑：这已被标记为重复问题。但是虽然pandas iloc vs ix vs loc解释？确实提到*

you can do column retrieval just by using the data frame's getitem:

您只需使用数据框的getitem即可进行列检索：

*

df['time']    # equivalent to df.loc[:, 'time']

it does not say why we use loc, although it does explain lots of features of loc, my specific question is 'why not just omit loc altogether'? for which i have accepted a very detailed answer below.

它没有说明我们为什么使用 loc，尽管它确实解释了 loc 的许多功能，但我的具体问题是“为什么不完全省略 loc”？我已经接受了下面非常详细的答案。

Also that other post the answer (which i do not think is an answer) is very hidden in the discussion and any person searching for what i was looking for would find it hard to locate the information and would be much better served by the answer provided to my question.

此外，其他帖子的答案（我认为不是答案）在讨论中非常隐蔽，任何搜索我正在寻找的内容的人都会发现很难找到信息，并且提供的答案会更好我的问题。

Answer 1

回答by unutbu

Explicit is better than implicit.
df[boolean_mask]selects rows where boolean_maskis True, but there is a corner case when you might not want it to: when dfhas boolean-valued column labels:
```
In [229]: df = pd.DataFrame({True:[1,2,3],False:[3,4,5]}); df
Out[229]: 
   False  True 
0      3      1
1      4      2
2      5      3
```
You might want to use df[[True]]to select the Truecolumn. Instead it raises a ValueError:
```
In [230]: df[[True]]
ValueError: Item wrong length 1 instead of 3.
```
Versus using loc:
```
In [231]: df.loc[[True]]
Out[231]: 
   False  True 
0      3      1
```
In contrast, the following does not raise ValueErroreven though the structure of df2is almost the same as df1above:
```
In [258]: df2 = pd.DataFrame({'A':[1,2,3],'B':[3,4,5]}); df2
Out[258]: 
   A  B
0  1  3
1  2  4
2  3  5

In [259]: df2[['B']]
Out[259]: 
   B
0  3
1  4
2  5
```
Thus, df[boolean_mask]does not always behave the same as df.loc[boolean_mask]. Even though this is arguably an unlikely use case, I would recommend always using df.loc[boolean_mask]instead of df[boolean_mask]because the meaning of df.loc's syntax is explicit. With df.loc[indexer]you know automatically that df.locis selecting rows. In contrast, it is not clear if df[indexer]will select rows or columns (or raise ValueError) without knowing details about indexerand df.
df.loc[row_indexer, column_index]can select rows andcolumns. df[indexer]can only select rows orcolumns depending on the type of values in indexerand the type of column values dfhas (again, are they boolean?).
```
In [237]: df2.loc[[True,False,True], 'B']
Out[237]: 
0    3
2    5
Name: B, dtype: int64
```
When a slice is passed to df.locthe end-points are included in the range. When a slice is passed to df[...], the slice is interpreted as a half-open interval:
```
In [239]: df2.loc[1:2]
Out[239]: 
   A  B
1  2  4
2  3  5

In [271]: df2[1:2]
Out[271]: 
   A  B
1  2  4
```

显式优于隐式。
df[boolean_mask]选择boolean_maskTrue 的行，但有一个极端情况，您可能不希望这样做：当df具有布尔值列标签时：
```
In [229]: df = pd.DataFrame({True:[1,2,3],False:[3,4,5]}); df
Out[229]: 
   False  True 
0      3      1
1      4      2
2      5      3
```
您可能想使用df[[True]]来选择True列。相反，它引发了一个ValueError：
```
In [230]: df[[True]]
ValueError: Item wrong length 1 instead of 3.
```
对比使用loc：
```
In [231]: df.loc[[True]]
Out[231]: 
   False  True 
0      3      1
```
相比之下，ValueError即使结构df2与df1上面几乎相同，以下内容也不会出现：
```
In [258]: df2 = pd.DataFrame({'A':[1,2,3],'B':[3,4,5]}); df2
Out[258]: 
   A  B
0  1  3
1  2  4
2  3  5

In [259]: df2[['B']]
Out[259]: 
   B
0  3
1  4
2  5
```
因此，df[boolean_mask]并不总是与相同df.loc[boolean_mask]。尽管这可以说是一个不太可能的用例，但我还是建议始终使用df.loc[boolean_mask]而不是df[boolean_mask]因为df.loc的语法的含义是明确的。随着df.loc[indexer]您自动知道，df.loc被选择行。相比之下，不清楚是否df[indexer]会ValueError在不了解indexer和的详细信息的情况下选择行或列（或 raise ）df。
df.loc[row_indexer, column_index]可以选择行和列。df[indexer]只能根据值的类型和列值的类型选择行或列（同样，它们是布尔值吗？）。indexerdf
```
In [237]: df2.loc[[True,False,True], 'B']
Out[237]: 
0    3
2    5
Name: B, dtype: int64
```
当切片传递到df.loc端点时，范围内包含。当切片传递给时df[...]，切片被解释为半开区间：
```
In [239]: df2.loc[1:2]
Out[239]: 
   A  B
1  2  4
2  3  5

In [271]: df2[1:2]
Out[271]: 
   A  B
1  2  4
```

Python：Pandas 系列 - 为什么使用 loc？

提问by Runner Bean

回答by unutbu

相关推荐

最近更新

标签

Python：Pandas 系列 - 为什么使用 loc？

提问by Runner Bean

回答by unutbu

相关推荐

Python pandas：根据位置而不是索引值替换值

Pandas：将 unicode 字符串转换为字符串

pandas 获取 groupby 中的第一个和最后一个值

将 strptime 函数应用于 pandas 系列

相关推荐

最近更新

标签