Python 当列名是整数时，按列号索引 Pandas DataFrame

Question

提问by Akavall

I am trying to keep just certain columns of a DataFrame, and it works fine when column names are strings:

我试图只保留 DataFrame 的某些列，当列名是字符串时它工作正常：

In [2]: import numpy as np

In [3]: import pandas as pd

In [4]: a = np.arange(35).reshape(5,7)

In [5]: df = pd.DataFrame(a, ['x', 'y', 'u', 'z', 'w'], ['a', 'b', 'c', 'd', 'e', 'f', 'g'])

In [6]: df
Out[6]: 
    a   b   c   d   e   f   g
x   0   1   2   3   4   5   6
y   7   8   9  10  11  12  13
u  14  15  16  17  18  19  20
z  21  22  23  24  25  26  27
w  28  29  30  31  32  33  34

[5 rows x 7 columns]

In [7]: df[[1,3]] #No problem
Out[7]: 
    b   d
x   1   3
y   8  10
u  15  17
z  22  24
w  29  31

However, when column names are integers, I am getting a key error:

但是，当列名是整数时，我收到一个关键错误：

In [8]: df = pd.DataFrame(a, ['x', 'y', 'u', 'z', 'w'], range(10, 17))

In [9]: df
Out[9]: 
   10  11  12  13  14  15  16
x   0   1   2   3   4   5   6
y   7   8   9  10  11  12  13
u  14  15  16  17  18  19  20
z  21  22  23  24  25  26  27
w  28  29  30  31  32  33  34

[5 rows x 7 columns]

In [10]: df[[1,3]]

Results in:

结果是：

KeyError: '[1 3] not in index'

I can see why pandas does not allow that -> to avoid mix up between indexing by column names and column numbers. However, is there a way to tell pandas that I want to index by column numbers? Of course, one solution is to convert column names to strings, but I am wondering if there is a better solution.

我可以理解为什么 Pandas 不允许这样做 -> 以避免在按列名和列号进行索引之间混淆。但是，有没有办法告诉熊猫我想按列号索引？当然，一种解决方案是将列名转换为字符串，但我想知道是否有更好的解决方案。

Answer 1

采纳答案by Jeff

This is exactly the purpose of iloc, see here

这正是iloc的目的，见这里

In [37]: df
Out[37]: 
   10  11  12  13  14  15  16
x   0   1   2   3   4   5   6
y   7   8   9  10  11  12  13
u  14  15  16  17  18  19  20
z  21  22  23  24  25  26  27
w  28  29  30  31  32  33  34

In [38]: df.iloc[:,[1,3]]
Out[38]: 
   11  13
x   1   3
y   8  10
u  15  17
z  22  24
w  29  31

Answer 2

回答by JD Long

This is certainly one of those things that feels like a bug but is really a design decision (I think).

这当然是感觉像错误但实际上是设计决策的事情之一（我认为）。

A few work around options:

一些解决选项：

rename the columns with their positions as their name:

用它们的位置作为名称重命名列：

 df.columns = arange(0,len(df.columns))

Another way is to get names from df.columns:

另一种方法是从df.columns以下位置获取名称：

print df[ df.columns[[1,3]] ]
   11  13
x   1   3
y   8  10
u  15  17
z  22  24
w  29  31

I suspect this is the most appealing as it just requires adding a wee bit of code and not changing any column names.

我怀疑这是最吸引人的，因为它只需要添加一点代码而不更改任何列名。

Python 当列名是整数时，按列号索引 Pandas DataFrame

提问by Akavall

采纳答案by Jeff

回答by JD Long

相关推荐

最近更新

标签

Python 当列名是整数时，按列号索引 Pandas DataFrame

提问by Akavall

采纳答案by Jeff

回答by JD Long

相关推荐

Python 为什么我会收到“回溯（最近一次调用最后一次）：”错误？

安装python库时出现“'cc' failed with exit status 1”错误

Numpy float64 与 Python 浮点数

如何检查python pandas中列的dtype

相关推荐

最近更新

标签