从 Pandas MultiIndex 中选择列

Question

提问by metakermit

I have DataFrame with MultiIndex columns that looks like this:

我有一个带有 MultiIndex 列的 DataFrame，如下所示：

# sample data
col = pd.MultiIndex.from_arrays([['one', 'one', 'one', 'two', 'two', 'two'],
                                ['a', 'b', 'c', 'a', 'b', 'c']])
data = pd.DataFrame(np.random.randn(4, 6), columns=col)
data

sample data

样本数据

What is the proper, simple way of selecting only specific columns (e.g. ['a', 'c'], not a range) from the second level?

['a', 'c']从第二级只选择特定列（例如，不是范围）的正确、简单的方法是什么？

Currently I am doing it like this:

目前我是这样做的：

import itertools
tuples = [i for i in itertools.product(['one', 'two'], ['a', 'c'])]
new_index = pd.MultiIndex.from_tuples(tuples)
print(new_index)
data.reindex_axis(new_index, axis=1)

expected result

预期结果

It doesn't feel like a good solution, however, because I have to bust out itertools, build another MultiIndex by hand and then reindex (and my actual code is even messier, since the column lists aren't so simple to fetch). I am pretty sure there has to be some ixor xsway of doing this, but everything I tried resulted in errors.

然而，这感觉不是一个好的解决方案，因为我必须退出itertools，手动构建另一个 MultiIndex 然后重新索引（我的实际代码甚至更混乱，因为获取列列表不是那么简单）。我很确定必须有一些方法ix或xs方法来做到这一点，但我尝试的一切都会导致错误。

Answer 1

采纳答案by DSM

It's not great, but maybe:

这不是很好，但也许：

>>> data
        one                           two                    
          a         b         c         a         b         c
0 -0.927134 -1.204302  0.711426  0.854065 -0.608661  1.140052
1 -0.690745  0.517359 -0.631856  0.178464 -0.312543 -0.418541
2  1.086432  0.194193  0.808235 -0.418109  1.055057  1.886883
3 -0.373822 -0.012812  1.329105  1.774723 -2.229428 -0.617690
>>> data.loc[:,data.columns.get_level_values(1).isin({"a", "c"})]
        one                 two          
          a         c         a         c
0 -0.927134  0.711426  0.854065  1.140052
1 -0.690745 -0.631856  0.178464 -0.418541
2  1.086432  0.808235 -0.418109  1.886883
3 -0.373822  1.329105  1.774723 -0.617690

would work?

会工作？

Answer 2

回答by Viktor Kerkez

You can use either, locor ixI'll show an example with loc:

您可以使用其中之一，loc或者ix我将展示一个示例loc：

data.loc[:, [('one', 'a'), ('one', 'c'), ('two', 'a'), ('two', 'c')]]

When you have a MultiIndexed DataFrame, and you want to filter out only some of the columns, you have to pass a list of tuples that match those columns. So the itertools approach was pretty much OK, but you don't have to create a new MultiIndex:

当您有一个 MultiIndexed DataFrame 并且您只想过滤掉一些列时，您必须传递与这些列匹配的元组列表。所以 itertools 方法非常好，但您不必创建新的 MultiIndex：

data.loc[:, list(itertools.product(['one', 'two'], ['a', 'c']))]

Answer 3

回答by FooBar

I think there is a much better way (now), which is why I bother pulling this question (which was the top google result) out of the shadows:

我认为有一个更好的方法（现在），这就是为什么我费心把这个问题（这是谷歌的最高结果）从阴影中拉出来：

data.select(lambda x: x[1] in ['a', 'b'], axis=1)

gives your expected output in a quick and clean one-liner:

以快速干净的单行方式提供您的预期输出：

        one                 two          
          a         b         a         b
0 -0.341326  0.374504  0.534559  0.429019
1  0.272518  0.116542 -0.085850 -0.330562
2  1.982431 -0.420668 -0.444052  1.049747
3  0.162984 -0.898307  1.762208 -0.101360

It is mostly self-explaining, the [1]refers to the level.

它主要是不言自明的，[1]指的是级别。

Answer 4

回答by Marc P.

To select all columns named 'a'and 'c'at the second level of your column indexer, you can use slicers:

要选择指定的所有列'a'，并'c'在您的列索引的第二级，您可以用切片机：

>>> data.loc[:, (slice(None), ('a', 'c'))]

        one                 two          
          a         c         a         c
0 -0.983172 -2.495022 -0.967064  0.124740
1  0.282661 -0.729463 -0.864767  1.716009
2  0.942445  1.276769 -0.595756 -0.973924
3  2.182908 -0.267660  0.281916 -0.587835

Hereyou can read more about slicers.

在这里您可以阅读有关切片器的更多信息。

Answer 5

回答by cs95

`ix`and `select`are deprecated!

`ix`并`select`已弃用！

The use of pd.IndexSlicemakes loca more preferable option to ixand select.

使用的pd.IndexSlice品牌loc更可取的选择，以ix和select。

`DataFrame.loc`with `pd.IndexSlice`

`DataFrame.loc`和 `pd.IndexSlice`

# Setup
col = pd.MultiIndex.from_arrays([['one', 'one', 'one', 'two', 'two', 'two'],
                                ['a', 'b', 'c', 'a', 'b', 'c']])
data = pd.DataFrame('x', index=range(4), columns=col)
data

  one       two      
    a  b  c   a  b  c
0   x  x  x   x  x  x
1   x  x  x   x  x  x
2   x  x  x   x  x  x
3   x  x  x   x  x  x

data.loc[:, pd.IndexSlice[:, ['a', 'c']]]

  one    two   
    a  c   a  c
0   x  x   x  x
1   x  x   x  x
2   x  x   x  x
3   x  x   x  x

You can alternatively an axisparameter to locto make it explicit which axis you're indexing from:

您也可以使用一个axis参数来loc明确您正在索引的轴：

data.loc(axis=1)[pd.IndexSlice[:, ['a', 'c']]]

  one    two   
    a  c   a  c
0   x  x   x  x
1   x  x   x  x
2   x  x   x  x
3   x  x   x  x

`MultiIndex.get_level_values`

Calling data.columns.get_level_valuesto filter with locis another option:

调用data.columns.get_level_values过滤器loc是另一种选择：

data.loc[:, data.columns.get_level_values(1).isin(['a', 'c'])]

  one    two   
    a  c   a  c
0   x  x   x  x
1   x  x   x  x
2   x  x   x  x
3   x  x   x  x

This can naturally allow for filtering on any conditional expression on a single level. Here's a random example with lexicographical filtering:

这自然可以允许在单个级别上过滤任何条件表达式。这是一个带有字典过滤的随机示例：

data.loc[:, data.columns.get_level_values(1) > 'b']

  one two
    c   c
0   x   x
1   x   x
2   x   x
3   x   x

More information on slicing and filtering MultiIndexes can be found at Select rows in pandas MultiIndex DataFrame.

有关切片和过滤 MultiIndex 的更多信息，请参见在Pandas MultiIndex DataFrame中选择行。

Answer 6

回答by Guilherme Salomé

The most straightforward way is with .loc:

最直接的方法是.loc：

>>> data.loc[:, (['one', 'two'], ['a', 'b'])]


   one       two     
     a    b    a    b
0  0.4 -0.6 -0.7  0.9
1  0.1  0.4  0.5 -0.3
2  0.7 -1.6  0.7 -0.8
3 -0.9  2.6  1.9  0.6

Remember that []and ()have special meaning when dealing with a MultiIndexobject:

记住这一点[]并()在处理MultiIndex对象时具有特殊含义：

(...) a tuple is interpreted as one multi-levelkey
(...) a list is used to specify several keys [on the same level]
(...) a tuple of lists refer to several values within a level

(...) 元组被解释为一个多级键
(...) 一个列表用于指定几个键 [在同一级别]
(...) 一个列表元组引用一个级别中的几个值

When we write (['one', 'two'], ['a', 'b']), the first list inside the tuple specifies all the values we want from the 1st level of the MultiIndex. The second list inside the tuple specifies all the values we want from the 2nd level of the MultiIndex.

当我们编写时(['one', 'two'], ['a', 'b'])，元组中的第一个列表指定了我们想要从MultiIndex. 元组中的第二个列表指定了我们想要从MultiIndex.

Edit 1:Another possibility is to use slice(None)to specify that we want anything from the first level (works similarly to slicing with :in lists). And then specify which columns from the second level we want.

编辑 1：另一种可能性是用于slice(None)指定我们想要来自第一级的任何内容（工作方式类似于:在列表中切片）。然后指定我们想要的第二级的哪些列。

>>> data.loc[:, (slice(None), ["a", "b"])]

   one       two     
     a    b    a    b
0  0.4 -0.6 -0.7  0.9
1  0.1  0.4  0.5 -0.3
2  0.7 -1.6  0.7 -0.8
3 -0.9  2.6  1.9  0.6

If the syntax slice(None)does appeal to you, then another possibility is to use pd.IndexSlice, which helps slicing frames with more elaborate indices.

如果语法slice(None)确实吸引您，那么另一种可能性是使用pd.IndexSlice，这有助于使用更精细的索引对帧进行切片。

>>> data.loc[:, pd.IndexSlice[:, ["a", "b"]]]

   one       two     
     a    b    a    b
0  0.4 -0.6 -0.7  0.9
1  0.1  0.4  0.5 -0.3
2  0.7 -1.6  0.7 -0.8
3 -0.9  2.6  1.9  0.6

When using pd.IndexSlice, we can use :as usual to slice the frame.

使用时pd.IndexSlice，我们可以:像往常一样使用对帧进行切片。

Source: MultiIndex / Advanced Indexing, How to use slice(None)

来源：MultiIndex/Advanced Indexing，如何使用slice(None)

Answer 7

回答by Nick P

A slightly easier, to my mind, riff on Marc P.'s answer using slice:

在我看来，稍微简单一点，即使用 slice对Marc P.的回答进行即兴演奏：

import pandas as pd
col = pd.MultiIndex.from_arrays([['one', 'one', 'one', 'two', 'two', 'two'], ['a', 'b', 'c', 'a', 'b', 'c']])
data = pd.DataFrame(np.random.randn(4, 6), columns=col)

data.loc[:, pd.IndexSlice[:, ['a', 'c']]]

        one                 two          
          a         c         a         c
0 -1.731008  0.718260 -1.088025 -1.489936
1 -0.681189  1.055909  1.825839  0.149438
2 -1.674623  0.769062  1.857317  0.756074
3  0.408313  1.291998  0.833145 -0.471879

As of pandas 0.21 or so, .select is deprecated in favour of .loc.

从 pandas 0.21 左右开始，不推荐使用 .select 以支持 .loc。

从 Pandas MultiIndex 中选择列

提问by metakermit

采纳答案by DSM

回答by Viktor Kerkez

回答by FooBar

回答by Marc P.

回答by cs95

`ix`and `select`are deprecated!

`ix`并`select`已弃用！

`DataFrame.loc`with `pd.IndexSlice`

`DataFrame.loc`和 `pd.IndexSlice`

`MultiIndex.get_level_values`

`MultiIndex.get_level_values`

回答by Guilherme Salomé

回答by Nick P

相关推荐

最近更新

标签

从 Pandas MultiIndex 中选择列

提问by metakermit

采纳答案by DSM

回答by Viktor Kerkez

回答by FooBar

回答by Marc P.

回答by cs95

ixand selectare deprecated!

ix并select已弃用！

DataFrame.locwith pd.IndexSlice

DataFrame.loc和 pd.IndexSlice

MultiIndex.get_level_values

MultiIndex.get_level_values

回答by Guilherme Salomé

回答by Nick P

相关推荐

pandas 如何将数据框列拆分为多列

pandas 在熊猫中格式化辅助 y 轴

pandas asfreq 和 resample 的区别

pandas 沿着它们的索引组合熊猫中的两个系列

相关推荐

最近更新

标签

`ix`and `select`are deprecated!

`ix`并`select`已弃用！

`DataFrame.loc`with `pd.IndexSlice`

`DataFrame.loc`和 `pd.IndexSlice`

`MultiIndex.get_level_values`

`MultiIndex.get_level_values`