Python 按标签选择多列(熊猫)
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/29241836/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Select multiple columns by labels (pandas)
提问by Minh Mai
I've been looking around for ways to select columns through the python documentation and the forums but every example on indexing columns are too simplistic.
我一直在寻找通过 python 文档和论坛来选择列的方法,但是关于索引列的每个例子都太简单了。
Suppose I have a 10 x 10 dataframe
假设我有一个 10 x 10 的数据框
df = DataFrame(randn(10, 10), index=range(0,10), columns=['A', 'B', 'C', 'D','E','F','G','H','I','J'])
So far, all the documentations gives is just a simple example of indexing like
到目前为止,所有提供的文档都只是一个简单的索引示例,例如
subset = df.loc[:,'A':'C']
or
或者
subset = df.loc[:,'C':]
But I get an error when I try index multiple, non-sequential columns, like this
但是当我尝试索引多个非连续列时出现错误,就像这样
subset = df.loc[:,('A':'C', 'E')]
How would I index in Pandas if I wanted to select column A to C, E, and G to I? It appears that this logic will not work
如果我想选择列 A 到 C、E 和 G 到 I,我将如何在 Pandas 中建立索引?看来这个逻辑行不通
subset = df.loc[:,('A':'C', 'E', 'G':'I')]
I feel that the solution is pretty simple, but I can't get around this error. Thanks!
我觉得解决方案很简单,但我无法解决这个错误。谢谢!
采纳答案by JohnE
Name- or Label-Based (using regular expression syntax)
基于名称或基于标签(使用正则表达式语法)
df.filter(regex='[A-CEG-I]') # does NOT depend on the column order
Location-Based (depends on column order)
基于位置(取决于列顺序)
df[ list(df.loc[:,'A':'C']) + ['E'] + list(df.loc[:,'G':'I']) ]
Note that unlike the label-based method, this only works if your columns are alphabetically sorted. This is not necessarily a problem, however. For example, if your columns go ['A','C','B']
, then you could replace 'A':'C'
above with 'A':'B'
.
请注意,与基于标签的方法不同,这仅在您的列按字母顺序排序时才有效。然而,这不一定是一个问题。例如,如果您的列是['A','C','B']
,那么您可以将'A':'C'
上面的'A':'B'
.
The Long Way
漫漫长路
And for completeness, you always have the option shown by @Magdalena of simply listing each column individually, although it could be much more verbose as the number of columns increases:
为了完整起见,您始终可以选择@Magdalena 显示的选项,即简单地单独列出每一列,尽管随着列数的增加,它可能会更加冗长:
df[['A','B','C','E','G','H','I']] # does NOT depend on the column order
Results for any of the above methods
任何上述方法的结果
A B C E G H I
0 -0.814688 -1.060864 -0.008088 2.697203 -0.763874 1.793213 -0.019520
1 0.549824 0.269340 0.405570 -0.406695 -0.536304 -1.231051 0.058018
2 0.879230 -0.666814 1.305835 0.167621 -1.100355 0.391133 0.317467
回答by Magdalena
Just pick the columns you want directly....
只需直接选择您想要的列....
df[['A','E','I','C']]