Pandas 中双括号`[[...]]` 和单括号`[..]` 索引的区别

Question

提问by Mike Fellner

I'm confused about the syntax regarding the following line of code:

我对以下代码行的语法感到困惑：

x_values = dataframe[['Brains']]

The dataframe object consists of 2 columns (Brains and Bodies)

数据框对象由 2 列（大脑和身体）组成

Brains Bodies
42     34
32     23

When I print x_values I get something like this:

当我打印 x_values 时，我得到如下信息：

Brains
0  42
1  32

I'm aware of the pandas documentation as far as attributes and methods of the dataframe object are concerned, but the double bracket syntax is confusing me.

就数据框对象的属性和方法而言，我知道 Pandas 文档，但双括号语法让我感到困惑。

Answer 1

回答by MaxU

Consider this:

考虑一下：

Source DF:

来源DF：

In [79]: df
Out[79]:
   Brains  Bodies
0      42      34
1      32      23

Selecting one column - results in Pandas.Series:

选择一列 - 结果在 Pandas.Series：

In [80]: df['Brains']
Out[80]:
0    42
1    32
Name: Brains, dtype: int64

In [81]: type(df['Brains'])
Out[81]: pandas.core.series.Series

Selecting subset of DataFrame - results in DataFrame:

选择 DataFrame 的子集 - 结果在 DataFrame：

In [82]: df[['Brains']]
Out[82]:
   Brains
0      42
1      32

In [83]: type(df[['Brains']])
Out[83]: pandas.core.frame.DataFrame

Conclusion:the second approach allows us to select multiple columns from the DataFrame. The first one just for selecting single column...

结论：第二种方法允许我们从 DataFrame 中选择多个列。第一个仅用于选择单列...

Demo:

演示：

In [84]: df = pd.DataFrame(np.random.rand(5,6), columns=list('abcdef'))

In [85]: df
Out[85]:
          a         b         c         d         e         f
0  0.065196  0.257422  0.273534  0.831993  0.487693  0.660252
1  0.641677  0.462979  0.207757  0.597599  0.117029  0.429324
2  0.345314  0.053551  0.634602  0.143417  0.946373  0.770590
3  0.860276  0.223166  0.001615  0.212880  0.907163  0.437295
4  0.670969  0.218909  0.382810  0.275696  0.012626  0.347549

In [86]: df[['e','a','c']]
Out[86]:
          e         a         c
0  0.487693  0.065196  0.273534
1  0.117029  0.641677  0.207757
2  0.946373  0.345314  0.634602
3  0.907163  0.860276  0.001615
4  0.012626  0.670969  0.382810

and if we specify only one column in the list we will get a DataFrame with one column:

如果我们只在列表中指定一列，我们将得到一个包含一列的 DataFrame：

In [87]: df[['e']]
Out[87]:
          e
0  0.487693
1  0.117029
2  0.946373
3  0.907163
4  0.012626

Answer 2

回答by SethMMorton

There is no special syntax in Python for [[and ]]. Rather, a list is being created, and then that list is being passed as an argument to the DataFrame indexing function.

Python 中没有特殊语法 for [[and ]]。相反，正在创建一个列表，然后将该列表作为参数传递给 DataFrame 索引函数。

As per @MaxU's answer, if you pass a single string to a DataFrame a series that represents that one column is returned. If you pass a list of strings, then a DataFrame that contains the given columns is returned.

根据@MaxU 的回答，如果您将单个字符串传递给 DataFrame，则返回一个表示该列的系列。如果传递字符串列表，则返回包含给定列的 DataFrame。

So, when you do the following

因此，当您执行以下操作时

# Print "Brains" column as Series
print(df['Brains'])
# Return a DataFrame with only one column called "Brains"
print(df[['Brains']])

It is equivalent to the following

它等价于以下

# Print "Brains" column as Series
column_to_get = 'Brains'
print(df[column_to_get])
# Return a DataFrame with only one column called "Brains"
subset_of_columns_to_get = ['Brains']
print(df[subset_of_columns_to_get])

In both cases, the DataFrame is being indexed with the []operator.

在这两种情况下，DataFrame 都被[]操作符索引。

Python uses the []operator for both indexing and for constructing list literals, and ultimately I believe this is your confusion. The outer [and ]in df[['Brains']]is performing the indexing, and the inner is creating a list.

Python 将[]运算符用于索引和构造列表文字，最终我相信这是您的困惑。外[和]在df[['Brains']]正在执行的索引，并且内被创建列表。

>>> some_list = ['Brains']
>>> some_list_of_lists = [['Brains']]
>>> ['Brains'] == [['Brains']][0]
True
>>> 'Brains' == [['Brains']][0][0] == [['Brains'][0]][0]
True

What I am illustrating above is that at no point does Python ever see [[and interpret it specially. In the last convoluted example ([['Brains'][0]][0]) there is no special ][operator or ]][operator... what happens is

我在上面说明的是，Python 从来没有[[特别地看到和解释它。在最后一个令人费解的例子 ( [['Brains'][0]][0]) 中，没有特殊的][运算符或]][运算符......发生的是

A single-element list is created (['Brains'])
The first element of that list is indexed (['Brains'][0]=> 'Brains')
That is placed into another list ([['Brains'][0]]=> ['Brains'])
And then the first element of that list is indexed ([['Brains'][0]][0]=> 'Brains')

创建一个单元素列表 ( ['Brains'])
该列表的第一个元素被索引（['Brains'][0]=> 'Brains'）
那被放入另一个列表（[['Brains'][0]]=> ['Brains']）
然后该列表的第一个元素被索引（[['Brains'][0]][0]=> 'Brains'）

Answer 3

回答by jpp

Other solutions demonstrate the difference between a series and a dataframe. For the Mathematically minded, you may wish to consider the dimensions of your input and output. Here's a summary:

其他解决方案展示了系列和数据框之间的区别。对于具有数学头脑的人，您可能希望考虑输入和输出的维度。这是一个总结：

Object                                Series          DataFrame
Dimensions (obj.ndim)                      1                  2
Syntax arg dim                             0                  1
Syntax                             df['col']        df[['col']]
Max indexing dim                           1                  2
Label indexing              df['col'].loc[x]   df.loc[x, 'col']
Label indexing (scalar)      df['col'].at[x]    df.at[x, 'col']
Integer indexing           df['col'].iloc[x]  df.iloc[x, 'col']
Integer indexing (scalar)   df['col'].iat[x]   dfi.at[x, 'col']

When you specify a scalar or list argument to pd.DataFrame.__getitem__, for which []is syntactic sugar, the dimension of your argument is one lessthan the dimension of your result. So a scalar (0-dimensional) gives a 1-dimensional series. A list (1-dimensional) gives a 2-dimensional dataframe. This makes sense since the additional dimension is the dataframe index, i.e. rows. This is the case even if your dataframe happens to have no rows.

当您为指定标量或列表参数时pd.DataFrame.__getitem__，它[]是语法糖，参数的维度比结果的维度小1。所以标量（0维）给出了一个一维序列。列表（一维）给出二维数据框。这是有道理的，因为附加维度是数据帧索引，即行。即使您的数据框碰巧没有行，情况也是如此。

Pandas 中双括号`[[...]]` 和单括号`[..]` 索引的区别

提问by Mike Fellner

回答by MaxU

回答by SethMMorton

回答by jpp

相关推荐

最近更新

标签

Pandas 中双括号`[[...]]` 和单括号`[..]` 索引的区别

提问by Mike Fellner

回答by MaxU

回答by SethMMorton

回答by jpp

相关推荐

pandas 大量数据的散点图

Pandas：使用 read_csv 解析不同列中的日期

pandas 重命名没有列名的熊猫数据框的列

pandas AttributeError:'list' 对象没有属性 'size'

相关推荐

最近更新

标签