在 Pandas/Python 中使用 loc 和仅使用方括号过滤列有什么区别？

Question

提问by Sean McCarthy

I've noticed three methods of selecting a column in a Pandas DataFrame:

我注意到在 Pandas DataFrame 中选择列的三种方法：

First method of selecting a column using loc:

使用 loc 选择列的第一种方法：

df_new = df.loc[:, 'col1']

Second method - seems simpler and faster:

第二种方法 - 看起来更简单更快：

df_new = df['col1']

Third method - most convenient:

第三种方法 - 最方便：

df_new = df.col1

Is there a difference between these three methods? I don't think so, in which case I'd rather use the third method.

这三种方法有区别吗？我不这么认为，在这种情况下，我宁愿使用第三种方法。

I'm mostly curious as to why there appear to be three methods for doing the same thing.

我很好奇为什么似乎有三种方法可以做同样的事情。

Answer 1

回答by ayhan

In the following situations, they behave the same:

在以下情况下，它们的行为相同：

Selecting a single column (df['A']is the same as df.loc[:, 'A']-> selects column A)
Selecting a list of columns (df[['A', 'B', 'C']]is the same as df.loc[:, ['A', 'B', 'C']]-> selects columns A, B and C)
Slicing by rows (df[1:3]is the same as df.iloc[1:3]-> selects rows 1 and 2. Note, however, if you slice rows with loc, instead of iloc, you'll get rows 1, 2 and 3 assuming you have a RandeIndex. See details here.)

选择单列（df['A']与df.loc[:, 'A']-> 选择列 A 相同）
选择列列表（df[['A', 'B', 'C']]与df.loc[:, ['A', 'B', 'C']]-> 选择列 A、B 和 C 相同）
按行切片（df[1:3]与df.iloc[1:3]-> 选择第 1 行和第 2 行相同。但是请注意，如果您使用loc, 而不是对行进行切片，iloc假设您有 RandeIndex，您将获得第 1、2 和 3 行。请在此处查看详细信息。）

However, []does not work in the following situations:

但是，[]在以下情况下不起作用：

You can select a single row with df.loc[row_label]
You can select a list of rows with df.loc[[row_label1, row_label2]]
You can slice columns with df.loc[:, 'A':'C']

您可以选择单行 df.loc[row_label]
您可以选择行列表 df.loc[[row_label1, row_label2]]
您可以将列切片 df.loc[:, 'A':'C']

These three cannot be done with []. More importantly, if your selection involves both rows and columns, then assignment becomes problematic.

这三个不能用[]. 更重要的是，如果您的选择同时涉及行和列，那么分配就会出现问题。

df[1:3]['A'] = 5

This selects rows 1 and 2, and then selects column 'A' of the returning object and assign value 5 to it. The problem is, the returning object might be a copy so this may not change the actual DataFrame. This raises SettingWithCopyWarning. The correct way of this assignment is

这将选择第 1 行和第 2 行，然后选择返回对象的“A”列并为其分配值 5。问题是，返回的对象可能是一个副本，因此这可能不会更改实际的 DataFrame。这会引发 SettingWithCopyWarning。这个赋值的正确方法是

df.loc[1:3, 'A'] = 5

With .loc, you are guaranteed to modify the original DataFrame. It also allows you to slice columns (df.loc[:, 'C':'F']), select a single row (df.loc[5]), and select a list of rows (df.loc[[1, 2, 5]]).

使用.loc，您可以保证修改原始 DataFrame。它还允许您对列进行切片 ( df.loc[:, 'C':'F'])、选择单行 ( df.loc[5]) 和选择行列表 ( df.loc[[1, 2, 5]])。

Also note that these two were not included in the API at the same time. .locwas added much later as a more powerful and explicit indexer. See unutbu's answerfor more detail.

另请注意，这两者并未同时包含在 API 中。.loc后来作为一个更强大和更明确的索引器被添加。有关更多详细信息，请参阅unutbu 的答案。

Note: Getting columns with []vs .is a completely different topic. .is only there for convenince. It only allows accessing columns whose name are valid Python identifier (i.e. they cannot contain spaces, they cannot be composed of numbers...). It cannot be used when the names conflict with Series/DataFrame methods. It also cannot be used for non-existing columns (i.e. the assignment df.a = 1won't work if there is no column a). Other than that, .and []are the same.

注意：使用[]vs获取列.是一个完全不同的主题。.只是为了方便。它只允许访问名称为有效 Python 标识符的列（即它们不能包含空格，它们不能由数字组成......）。当名称与 Series/DataFrame 方法冲突时不能使用。它也不能用于不存在的列（即，df.a = 1如果没有列，则分配将不起作用a）。除此之外，.并且[]是相同的。

Answer 2

回答by Freeman

locis specially useful when the index is not numeric (e.g. a DatetimeIndex) because you can get rowswith particular labels from the index:

loc当索引不是数字（例如 DatetimeIndex）时特别有用，因为您可以从索引中获取具有特定标签的行：

df.loc['2010-05-04 07:00:00']
df.loc['2010-1-1 0:00:00':'2010-12-31 23:59:59 ','Price']

However []is intended to get columnswith particular names:

但是[]，旨在获取具有特定名称的列：

df['Price']

With []you can also filter rows, but it is more elaborated:

随着[]您还可以过滤行，但它更多的阐述：

df[df['Date'] < datetime.datetime(2010,1,1,7,0,0)]['Price']

Answer 3

回答by Matthew Son

There seems to be a difference between df.loc[] and df[] when you create dataframe with multiple columns.

当您创建具有多列的数据框时， df.loc[] 和 df[] 之间似乎存在差异。

You can refer to this question: Is there a nice way to generate multiple columns using .loc?

可以参考这个问题： Is there a nice way to generate multiple columns using .loc?

Here, you can't generate multiple columns using df.loc[:,['name1','name2']]but you can do by just using double bracket df[['name1','name2']]. (I wonder why they behave differently.)

在这里，您不能使用生成多列，df.loc[:,['name1','name2']]但您可以仅使用双括号来生成df[['name1','name2']]。（我想知道为什么他们的行为不同。）

在 Pandas/Python 中使用 loc 和仅使用方括号过滤列有什么区别？

提问by Sean McCarthy

回答by ayhan

回答by Freeman

回答by Matthew Son

相关推荐

最近更新

标签

在 Pandas/Python 中使用 loc 和仅使用方括号过滤列有什么区别？

提问by Sean McCarthy

回答by ayhan

回答by Freeman

回答by Matthew Son

相关推荐

Pandas 将对象列转换为 str - 列包含 unicode、float 等

将函数或 Lambda 应用于 Pandas GROUPBY

为什么使用 pandas.assign 而不是简单地初始化新列？

Python3 如何在电子邮件中发送 Pandas Dataframe

相关推荐

最近更新

标签