Python 按标签选择多列（熊猫）

Question

提问by Minh Mai

I've been looking around for ways to select columns through the python documentation and the forums but every example on indexing columns are too simplistic.

我一直在寻找通过 python 文档和论坛来选择列的方法，但是关于索引列的每个例子都太简单了。

Suppose I have a 10 x 10 dataframe

假设我有一个 10 x 10 的数据框

df = DataFrame(randn(10, 10), index=range(0,10), columns=['A', 'B', 'C', 'D','E','F','G','H','I','J'])

So far, all the documentations gives is just a simple example of indexing like

到目前为止，所有提供的文档都只是一个简单的索引示例，例如

subset = df.loc[:,'A':'C']

or

或者

subset = df.loc[:,'C':]

But I get an error when I try index multiple, non-sequential columns, like this

但是当我尝试索引多个非连续列时出现错误，就像这样

subset = df.loc[:,('A':'C', 'E')]

How would I index in Pandas if I wanted to select column A to C, E, and G to I? It appears that this logic will not work

如果我想选择列 A 到 C、E 和 G 到 I，我将如何在 Pandas 中建立索引？看来这个逻辑行不通

subset = df.loc[:,('A':'C', 'E', 'G':'I')]

I feel that the solution is pretty simple, but I can't get around this error. Thanks!

我觉得解决方案很简单，但我无法解决这个错误。谢谢！

Answer 1

采纳答案by JohnE

Name- or Label-Based (using regular expression syntax)

基于名称或基于标签（使用正则表达式语法）

df.filter(regex='[A-CEG-I]')   # does NOT depend on the column order

Location-Based (depends on column order)

基于位置（取决于列顺序）

df[ list(df.loc[:,'A':'C']) + ['E'] + list(df.loc[:,'G':'I']) ]

Note that unlike the label-based method, this only works if your columns are alphabetically sorted. This is not necessarily a problem, however. For example, if your columns go ['A','C','B'], then you could replace 'A':'C'above with 'A':'B'.

请注意，与基于标签的方法不同，这仅在您的列按字母顺序排序时才有效。然而，这不一定是一个问题。例如，如果您的列是['A','C','B']，那么您可以将'A':'C'上面的'A':'B'.

The Long Way

漫漫长路

And for completeness, you always have the option shown by @Magdalena of simply listing each column individually, although it could be much more verbose as the number of columns increases:

为了完整起见，您始终可以选择@Magdalena 显示的选项，即简单地单独列出每一列，尽管随着列数的增加，它可能会更加冗长：

df[['A','B','C','E','G','H','I']]   # does NOT depend on the column order

Results for any of the above methods

任何上述方法的结果

          A         B         C         E         G         H         I
0 -0.814688 -1.060864 -0.008088  2.697203 -0.763874  1.793213 -0.019520
1  0.549824  0.269340  0.405570 -0.406695 -0.536304 -1.231051  0.058018
2  0.879230 -0.666814  1.305835  0.167621 -1.100355  0.391133  0.317467

Answer 2

回答by Magdalena

Just pick the columns you want directly....

只需直接选择您想要的列....

df[['A','E','I','C']]

Python 按标签选择多列（熊猫）

提问by Minh Mai

采纳答案by JohnE

Name- or Label-Based (using regular expression syntax)

基于名称或基于标签（使用正则表达式语法）

Location-Based (depends on column order)

基于位置（取决于列顺序）

The Long Way

漫漫长路

Results for any of the above methods

任何上述方法的结果

回答by Magdalena

相关推荐

最近更新

标签

Python 按标签选择多列（熊猫）

提问by Minh Mai

采纳答案by JohnE

Name- or Label-Based (using regular expression syntax)

基于名称或基于标签（使用正则表达式语法）

Location-Based (depends on column order)

基于位置（取决于列顺序）

The Long Way

漫漫长路

Results for any of the above methods

任何上述方法的结果

回答by Magdalena

相关推荐

Python字典获取多个值

未转换的数据仍然存在：Python 中的 .387000

检查python中的超时错误

Python 切片字典

相关推荐

最近更新

标签