Python Pandas:为什么在布尔索引后选择列需要双括号

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/33417991/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 13:20:12  来源:igfitidea点击:

Pandas: Why are double brackets needed to select column after boolean indexing

pythonpandasindexingboolean

提问by FortuneFaded

For a df table like below,

对于如下所示的 df 表,

   A B C D
0  0 1 1 1
1  2 3 5 7
3  3 1 2 8

why are the double brackets needed for selecting specific columns after boolean indexing?

为什么在布尔索引后选择特定列需要双括号?

the [['A','C']] part of

df[df['A'] < 3][['A','C']]

采纳答案by Joseph

For pandas objects (Series, DataFrame), the indexing operator [] only accepts

对于 Pandas 对象(Series、DataFrame),索引操作符 [] 只接受

  1. colnameor list of colnames to select column(s)
  2. slicing or Boolean array to select row(s), i.e. it only refers to one dimension of the dataframe.
  1. colname或用于选择列的列名列表
  2. 切片或布尔数组以选择行,即它仅指数据帧的一维。

For df[[colname(s)]], the interior brackets are for list, and the outside brackets are indexing operator, i.e. you must use double brackets if you select two or more columns. With one column name, single pair of brackets returns a Series, while double brackets return a dataframe.

对于df[[colname(s)]],内括号是列表,外括号是索引运算符,即如果您选择两列或更多列,则必须使用双括号。对于一个列名,一对括号返回一个系列,而双括号返回一个数据帧。

Also, df.ix[df['A'] < 3,['A','C']]or df.loc[df['A'] < 3,['A','C']]is better than the chained selection for avoiding returning a copy versus a view of the dataframe.

此外,df.ix[df['A'] < 3,['A','C']]或者df.loc[df['A'] < 3,['A','C']]比避免返回副本与数据帧视图的链接选择更好。

Please refer pandas documentationfor details

请参见大熊猫文件的详细信息

回答by EdChum

Because you have no columns named 'A','C', which is what you'd be trying to do which will raise a KeyError, so you have to use an iterable to sub-select from the df.

因为您没有名为 的列'A','C',这就是您要尝试执行的操作,这将引发 a KeyError,所以您必须使用可迭代对象从 df 中进行子选择。

So

所以

df[df['A'] < 3]['A','C']

raises

加注

KeyError: ('A', 'C')

KeyError: ('A', 'C')

Which is different to

哪个不同于

In [261]:
df[df['A'] < 3][['A','C']]

Out[261]:
   A  C
0  0  1
1  2  5

This is no different to trying:

这与尝试没有什么不同:

df['A','C']

hence why you need double square brackets:

因此为什么你需要双方括号:

df[['A','C']]

Note that the modern way is to use .ix:

请注意,现代方法是使用.ix

In [264]:
df.ix[df['A'] < 3,['A','C']]

Out[264]:
   A  C
0  0  1
1  2  5

So that you're operating on a view rather than potentially a copy

这样您就可以操作视图而不是潜在的副本

回答by Pavel Savara

Because inner brackets are just python syntax (literal) for list.

因为内括号只是列表的python语法(文字)。

The outer brackets are the indexer operation of pandas dataframe object.

外括号是pandas数据框对象的索引器操作。

In this use case inner ['A', 'B']defines the list of columns to pass as single argumentto the indexer operation, which is denoted by outer brackets.

在此用例中,inner['A', 'B']定义要作为单个参数传递给索引器操作的列列表,由外括号表示。

回答by Alex Ustymenko

Adding to previous responses, you could also use df.ilocaccessor if you need to select index positions. It's also making the code more reproducible, which is nice.

添加到以前的响应,df.iloc如果您需要选择索引位置,您还可以使用访问器。它还使代码更具可重现性,这很好。