pandas `.loc` 和 `.iloc` 带有 MultiIndex'd DataFrame

Question

提问by Brad Solomon

When indexing a MultiIndex-ed DataFrame, it seems like .ilocassumes you're referencing the "inner level" of the index while .loclooks at the outer level.

在索引 MultiIndex-ed DataFrame 时，似乎.iloc假设您正在引用索引的“内部级别”，同时.loc查看外部级别。

For example:

例如：

np.random.seed(123)
iterables = [['bar', 'baz', 'foo', 'qux'], ['one', 'two']]
idx = pd.MultiIndex.from_product(iterables, names=['first', 'second'])
df = pd.DataFrame(np.random.randn(8, 4), index=idx)

# .loc looks at the outer index:

print(df.loc['qux'])
# df.loc['two'] would throw KeyError
              0        1        2        3
second                                    
one    -1.25388 -0.63775  0.90711 -1.42868
two    -0.14007 -0.86175 -0.25562 -2.79859

# while .iloc looks at the inner index:

print(df.iloc[-1])
0   -0.14007
1   -0.86175
2   -0.25562
3   -2.79859
Name: (qux, two), dtype: float64

Two questions:

两个问题：

Firstly, why is this? Is it a deliberate design decision?

首先，这是为什么？这是一个深思熟虑的设计决定吗？

Secondly, can I use .ilocto reference the outer level of the index, to yield the result below? I'm aware I could first find the last member of the index with get_level_valuesand then .loc-index with that, but wandering if it can be done more directly, either with funky .ilocsyntax or some existing function designed specifically for the case.

其次，我可以使用.iloc引用索引的外层，以产生下面的结果吗？我知道我可以先找到索引的最后一个成员，get_level_values然后使用.loc-index 找到它，但是如果可以更直接地完成它，无论是使用时髦的.iloc语法还是一些专门为这种情况设计的现有函数，我都会犹豫不决。

# df.iloc[-1]
qux   one     0.89071  1.75489  1.49564  1.06939
      two    -0.77271  0.79486  0.31427 -1.32627

Answer 1

采纳答案by Brad Solomon

Yes, this is a deliberate design decision:

是的，这是一个深思熟虑的设计决定：

.ilocis a strict positional indexer, it does notregard the structure at all, only the first actual behavior. ... .locdoestake into account the level behavior. [emphasis added]

.iloc是一个严格的位置索引器，它根本不考虑结构，只考虑第一个实际行为。....loc确实考虑到了关卡行为。【强调】

So the desired result given in the question is not possible in a flexible manner with .iloc. The closest workaround, used in several similar questions, is

因此，问题中给出的预期结果不可能以灵活的方式与.iloc. 在几个类似问题中使用的最接近的解决方法是

print(df.loc[[df.index.get_level_values(0)[-1]]])
                    0        1        2        3
first second                                    
qux   one    -1.25388 -0.63775  0.90711 -1.42868
      two    -0.14007 -0.86175 -0.25562 -2.79859

Using double bracketswill retain the first index level.

使用双括号将保留第一个索引级别。

Answer 2

回答by FabienP

You can use:

您可以使用：

df.iloc[[6, 7], :]
Out[1]:
                     0         1         2         3
first second
qux   one    -1.253881 -0.637752  0.907105 -1.428681
      two    -0.140069 -0.861755 -0.255619 -2.798589

Where [6, 7]correspond to the actual row indexes of these lines, as you can see below:

其中[6, 7]对应于这些线路的实际行的索引，你可以看到如下：

df.reset_index()
Out[]:
  first second         0         1         2         3
0   bar    one -1.085631  0.997345  0.282978 -1.506295
1   bar    two -0.578600  1.651437 -2.426679 -0.428913
2   baz    one  1.265936 -0.866740 -0.678886 -0.094709
3   baz    two  1.491390 -0.638902 -0.443982 -0.434351
4   foo    one  2.205930  2.186786  1.004054  0.386186
5   foo    two  0.737369  1.490732 -0.935834  1.175829
6   qux    one -1.253881 -0.637752  0.907105 -1.428681
7   qux    two -0.140069 -0.861755 -0.255619 -2.798589

This also works with df.iloc[[-2, -1], :]or df.iloc[range(-2, 0), :].

这也适用于df.iloc[[-2, -1], :]或df.iloc[range(-2, 0), :]。

EDIT: Turning it into a more generic solution

编辑：把它变成一个更通用的解决方案

Then it is possible to get a generic function:

那么就有可能得到一个泛型函数：

def multindex_iloc(df, index):
    label = df.index.levels[0][index]
    return df.iloc[df.index.get_loc(label)]

multiindex_loc(df, -1)
Out[]:
                     0         1         2         3
first second
qux   one    -1.253881 -0.637752  0.907105 -1.428681
      two    -0.140069 -0.861755 -0.255619 -2.798589


multiindex_loc(df, 2)
Out[]:
                     0         1         2         3
first second
foo   one     2.205930  2.186786  1.004054  0.386186
      two     0.737369  1.490732 -0.935834  1.175829

Answer 3

回答by H?ken Lid

You can use the swaplevelmethod to reorder the index before using loc.

您可以使用swaplevel方法在使用之前对索引重新排序loc。

df.swaplevel(0,-1).loc['two']

With the sample data from your question, it looks like this:

使用您问题中的示例数据，它看起来像这样：

>>> df
                     0         1         2         3
first second                                        
bar   one    -1.085631  0.997345  0.282978 -1.506295
      two    -0.578600  1.651437 -2.426679 -0.428913
baz   one     1.265936 -0.866740 -0.678886 -0.094709
      two     1.491390 -0.638902 -0.443982 -0.434351
foo   one     2.205930  2.186786  1.004054  0.386186
      two     0.737369  1.490732 -0.935834  1.175829
qux   one    -1.253881 -0.637752  0.907105 -1.428681
      two    -0.140069 -0.861755 -0.255619 -2.798589

>>> df.loc['bar']
               0         1         2         3
second                                        
one    -1.085631  0.997345  0.282978 -1.506295
two    -0.578600  1.651437 -2.426679 -0.428913

>>> df.swaplevel().loc['two']
              0         1         2         3
first                                        
bar   -0.578600  1.651437 -2.426679 -0.428913
baz    1.491390 -0.638902 -0.443982 -0.434351
foo    0.737369  1.490732 -0.935834  1.175829
qux   -0.140069 -0.861755 -0.255619 -2.798589

swaplevelis a MultiIndex method, but you can call it directly on the DataFrame. The default is to swap the inner two levels, so if you have more than two levels in the multi index, you should explicitly state the level you want to swap.

swaplevel是一个 MultiIndex 方法，但您可以直接在 DataFrame 上调用它。默认是交换内部的两个级别，因此如果多索引中有两个以上的级别，则应明确说明要交换的级别。

df.swaplevel(0,-1).loc['two']

pandas `.loc` 和 `.iloc` 带有 MultiIndex'd DataFrame

提问by Brad Solomon

采纳答案by Brad Solomon

回答by FabienP

回答by H?ken Lid

相关推荐

最近更新

标签

pandas `.loc` 和 `.iloc` 带有 MultiIndex'd DataFrame

提问by Brad Solomon

采纳答案by Brad Solomon

回答by FabienP

回答by H?ken Lid

相关推荐

向 Pandas 数据框插入一列

pandas 类型错误：'DataFrame' 对象不可调用 python 函数

pandas Python TypeError：'numpy.int32'对象不可迭代

在 Pandas DataFrame 中取消嵌套（分解）多个列表列的有效方法

相关推荐

最近更新

标签