Pandas 中双括号`[[...]]` 和单括号`[..]` 索引的区别
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/45201104/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
The difference between double brace `[[...]]` and single brace `[..]` indexing in Pandas
提问by Mike Fellner
I'm confused about the syntax regarding the following line of code:
我对以下代码行的语法感到困惑:
x_values = dataframe[['Brains']]
The dataframe object consists of 2 columns (Brains and Bodies)
数据框对象由 2 列(大脑和身体)组成
Brains Bodies
42 34
32 23
When I print x_values I get something like this:
当我打印 x_values 时,我得到如下信息:
Brains
0 42
1 32
I'm aware of the pandas documentation as far as attributes and methods of the dataframe object are concerned, but the double bracket syntax is confusing me.
就数据框对象的属性和方法而言,我知道 Pandas 文档,但双括号语法让我感到困惑。
回答by MaxU
Consider this:
考虑一下:
Source DF:
来源DF:
In [79]: df
Out[79]:
Brains Bodies
0 42 34
1 32 23
Selecting one column - results in Pandas.Series:
选择一列 - 结果在 Pandas.Series:
In [80]: df['Brains']
Out[80]:
0 42
1 32
Name: Brains, dtype: int64
In [81]: type(df['Brains'])
Out[81]: pandas.core.series.Series
Selecting subset of DataFrame - results in DataFrame:
选择 DataFrame 的子集 - 结果在 DataFrame:
In [82]: df[['Brains']]
Out[82]:
Brains
0 42
1 32
In [83]: type(df[['Brains']])
Out[83]: pandas.core.frame.DataFrame
Conclusion:the second approach allows us to select multiple columns from the DataFrame. The first one just for selecting single column...
结论:第二种方法允许我们从 DataFrame 中选择多个列。第一个仅用于选择单列...
Demo:
演示:
In [84]: df = pd.DataFrame(np.random.rand(5,6), columns=list('abcdef'))
In [85]: df
Out[85]:
a b c d e f
0 0.065196 0.257422 0.273534 0.831993 0.487693 0.660252
1 0.641677 0.462979 0.207757 0.597599 0.117029 0.429324
2 0.345314 0.053551 0.634602 0.143417 0.946373 0.770590
3 0.860276 0.223166 0.001615 0.212880 0.907163 0.437295
4 0.670969 0.218909 0.382810 0.275696 0.012626 0.347549
In [86]: df[['e','a','c']]
Out[86]:
e a c
0 0.487693 0.065196 0.273534
1 0.117029 0.641677 0.207757
2 0.946373 0.345314 0.634602
3 0.907163 0.860276 0.001615
4 0.012626 0.670969 0.382810
and if we specify only one column in the list we will get a DataFrame with one column:
如果我们只在列表中指定一列,我们将得到一个包含一列的 DataFrame:
In [87]: df[['e']]
Out[87]:
e
0 0.487693
1 0.117029
2 0.946373
3 0.907163
4 0.012626
回答by SethMMorton
There is no special syntax in Python for [[
and ]]
. Rather, a list is being created, and then that list is being passed as an argument to the DataFrame indexing function.
Python 中没有特殊语法 for [[
and ]]
。相反,正在创建一个列表,然后将该列表作为参数传递给 DataFrame 索引函数。
As per @MaxU's answer, if you pass a single string to a DataFrame a series that represents that one column is returned. If you pass a list of strings, then a DataFrame that contains the given columns is returned.
根据@MaxU 的回答,如果您将单个字符串传递给 DataFrame,则返回一个表示该列的系列。如果传递字符串列表,则返回包含给定列的 DataFrame。
So, when you do the following
因此,当您执行以下操作时
# Print "Brains" column as Series
print(df['Brains'])
# Return a DataFrame with only one column called "Brains"
print(df[['Brains']])
It is equivalent to the following
它等价于以下
# Print "Brains" column as Series
column_to_get = 'Brains'
print(df[column_to_get])
# Return a DataFrame with only one column called "Brains"
subset_of_columns_to_get = ['Brains']
print(df[subset_of_columns_to_get])
In both cases, the DataFrame is being indexed with the []
operator.
在这两种情况下,DataFrame 都被[]
操作符索引。
Python uses the []
operator for both indexing and for constructing list literals, and ultimately I believe this is your confusion. The outer [
and ]
in df[['Brains']]
is performing the indexing, and the inner is creating a list.
Python 将[]
运算符用于索引和构造列表文字,最终我相信这是您的困惑。外[
和]
在df[['Brains']]
正在执行的索引,并且内被创建列表。
>>> some_list = ['Brains']
>>> some_list_of_lists = [['Brains']]
>>> ['Brains'] == [['Brains']][0]
True
>>> 'Brains' == [['Brains']][0][0] == [['Brains'][0]][0]
True
What I am illustrating above is that at no point does Python ever see [[
and interpret it specially. In the last convoluted example ([['Brains'][0]][0]
) there is no special ][
operator or ]][
operator... what happens is
我在上面说明的是,Python 从来没有[[
特别地看到和解释它。在最后一个令人费解的例子 ( [['Brains'][0]][0]
) 中,没有特殊的][
运算符或]][
运算符......发生的是
- A single-element list is created (
['Brains']
) - The first element of that list is indexed (
['Brains'][0]
=>'Brains'
) - That is placed into another list (
[['Brains'][0]]
=>['Brains']
) - And then the first element of that list is indexed (
[['Brains'][0]][0]
=>'Brains'
)
- 创建一个单元素列表 (
['Brains']
) - 该列表的第一个元素被索引(
['Brains'][0]
=>'Brains'
) - 那被放入另一个列表(
[['Brains'][0]]
=>['Brains']
) - 然后该列表的第一个元素被索引(
[['Brains'][0]][0]
=>'Brains'
)
回答by jpp
Other solutions demonstrate the difference between a series and a dataframe. For the Mathematically minded, you may wish to consider the dimensions of your input and output. Here's a summary:
其他解决方案展示了系列和数据框之间的区别。对于具有数学头脑的人,您可能希望考虑输入和输出的维度。这是一个总结:
Object Series DataFrame
Dimensions (obj.ndim) 1 2
Syntax arg dim 0 1
Syntax df['col'] df[['col']]
Max indexing dim 1 2
Label indexing df['col'].loc[x] df.loc[x, 'col']
Label indexing (scalar) df['col'].at[x] df.at[x, 'col']
Integer indexing df['col'].iloc[x] df.iloc[x, 'col']
Integer indexing (scalar) df['col'].iat[x] dfi.at[x, 'col']
When you specify a scalar or list argument to pd.DataFrame.__getitem__
, for which []
is syntactic sugar, the dimension of your argument is one lessthan the dimension of your result. So a scalar (0-dimensional) gives a 1-dimensional series. A list (1-dimensional) gives a 2-dimensional dataframe. This makes sense since the additional dimension is the dataframe index, i.e. rows. This is the case even if your dataframe happens to have no rows.
当您为 指定标量或列表参数时pd.DataFrame.__getitem__
,它[]
是语法糖,参数的维度比结果的维度小1。所以标量(0维)给出了一个一维序列。列表(一维)给出二维数据框。这是有道理的,因为附加维度是数据帧索引,即行。即使您的数据框碰巧没有行,情况也是如此。