Python 当列名是整数时,按列号索引 Pandas DataFrame

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/27156278/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 01:29:21  来源:igfitidea点击:

Index pandas DataFrame by column numbers, when column names are integers

pythonpandas

提问by Akavall

I am trying to keep just certain columns of a DataFrame, and it works fine when column names are strings:

我试图只保留 DataFrame 的某些列,当列名是字符串时它工作正常:

In [2]: import numpy as np

In [3]: import pandas as pd

In [4]: a = np.arange(35).reshape(5,7)

In [5]: df = pd.DataFrame(a, ['x', 'y', 'u', 'z', 'w'], ['a', 'b', 'c', 'd', 'e', 'f', 'g'])

In [6]: df
Out[6]: 
    a   b   c   d   e   f   g
x   0   1   2   3   4   5   6
y   7   8   9  10  11  12  13
u  14  15  16  17  18  19  20
z  21  22  23  24  25  26  27
w  28  29  30  31  32  33  34

[5 rows x 7 columns]

In [7]: df[[1,3]] #No problem
Out[7]: 
    b   d
x   1   3
y   8  10
u  15  17
z  22  24
w  29  31

However, when column names are integers, I am getting a key error:

但是,当列名是整数时,我收到一个关键错误:

In [8]: df = pd.DataFrame(a, ['x', 'y', 'u', 'z', 'w'], range(10, 17))

In [9]: df
Out[9]: 
   10  11  12  13  14  15  16
x   0   1   2   3   4   5   6
y   7   8   9  10  11  12  13
u  14  15  16  17  18  19  20
z  21  22  23  24  25  26  27
w  28  29  30  31  32  33  34

[5 rows x 7 columns]

In [10]: df[[1,3]]

Results in:

结果是:

KeyError: '[1 3] not in index'

I can see why pandas does not allow that -> to avoid mix up between indexing by column names and column numbers. However, is there a way to tell pandas that I want to index by column numbers? Of course, one solution is to convert column names to strings, but I am wondering if there is a better solution.

我可以理解为什么 Pandas 不允许这样做 -> 以避免在按列名和列号进行索引之间混淆。但是,有没有办法告诉熊猫我想按列号索引?当然,一种解决方案是将列名转换为字符串,但我想知道是否有更好的解决方案。

采纳答案by Jeff

This is exactly the purpose of iloc, see here

这正是iloc的目的,见这里

In [37]: df
Out[37]: 
   10  11  12  13  14  15  16
x   0   1   2   3   4   5   6
y   7   8   9  10  11  12  13
u  14  15  16  17  18  19  20
z  21  22  23  24  25  26  27
w  28  29  30  31  32  33  34

In [38]: df.iloc[:,[1,3]]
Out[38]: 
   11  13
x   1   3
y   8  10
u  15  17
z  22  24
w  29  31

回答by JD Long

This is certainly one of those things that feels like a bug but is really a design decision (I think).

这当然是感觉像错误但实际上是设计决策的事情之一(我认为)。

A few work around options:

一些解决选项:

rename the columns with their positions as their name:

用它们的位置作为名称重命名列:

 df.columns = arange(0,len(df.columns))

Another way is to get names from df.columns:

另一种方法是从df.columns以下位置获取名称:

print df[ df.columns[[1,3]] ]
   11  13
x   1   3
y   8  10
u  15  17
z  22  24
w  29  31

I suspect this is the most appealing as it just requires adding a wee bit of code and not changing any column names.

我怀疑这是最吸引人的,因为它只需要添加一点代码而不更改任何列名。