pandas `.loc` 和 `.iloc` 带有 MultiIndex'd DataFrame

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/45967702/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 04:21:27  来源:igfitidea点击:

`.loc` and `.iloc` with MultiIndex'd DataFrame

pythonpython-3.xpandasdataframe

提问by Brad Solomon

When indexing a MultiIndex-ed DataFrame, it seems like .ilocassumes you're referencing the "inner level" of the index while .loclooks at the outer level.

在索引 MultiIndex-ed DataFrame 时,似乎.iloc假设您正在引用索引的“内部级别”,同时.loc查看外部级别。

For example:

例如:

np.random.seed(123)
iterables = [['bar', 'baz', 'foo', 'qux'], ['one', 'two']]
idx = pd.MultiIndex.from_product(iterables, names=['first', 'second'])
df = pd.DataFrame(np.random.randn(8, 4), index=idx)

# .loc looks at the outer index:

print(df.loc['qux'])
# df.loc['two'] would throw KeyError
              0        1        2        3
second                                    
one    -1.25388 -0.63775  0.90711 -1.42868
two    -0.14007 -0.86175 -0.25562 -2.79859

# while .iloc looks at the inner index:

print(df.iloc[-1])
0   -0.14007
1   -0.86175
2   -0.25562
3   -2.79859
Name: (qux, two), dtype: float64

Two questions:

两个问题:

Firstly, why is this? Is it a deliberate design decision?

首先,这是为什么?这是一个深思熟虑的设计决定吗?

Secondly, can I use .ilocto reference the outer level of the index, to yield the result below? I'm aware I could first find the last member of the index with get_level_valuesand then .loc-index with that, but wandering if it can be done more directly, either with funky .ilocsyntax or some existing function designed specifically for the case.

其次,我可以使用.iloc引用索引的外层,以产生下面的结果吗?我知道我可以先找到索引的最后一个成员,get_level_values然后使用.loc-index 找到它,但是如果可以更直接地完成它,无论是使用时髦的.iloc语法还是一些专门为这种情况设计的现有函数,我都会犹豫不决。

# df.iloc[-1]
qux   one     0.89071  1.75489  1.49564  1.06939
      two    -0.77271  0.79486  0.31427 -1.32627

采纳答案by Brad Solomon

Yes, this is a deliberate design decision:

是的,这是一个深思熟虑的设计决定

.ilocis a strict positional indexer, it does notregard the structure at all, only the first actual behavior. ... .locdoestake into account the level behavior. [emphasis added]

.iloc是一个严格的位置索引器,它根本不考虑结构,只考虑第一个实际行为。....loc确实考虑到了关卡行为。【强调】

So the desired result given in the question is not possible in a flexible manner with .iloc. The closest workaround, used in several similar questions, is

因此,问题中给出的预期结果不可能以灵活的方式与.iloc. 在几个类似问题中使用的最接近的解决方法是

print(df.loc[[df.index.get_level_values(0)[-1]]])
                    0        1        2        3
first second                                    
qux   one    -1.25388 -0.63775  0.90711 -1.42868
      two    -0.14007 -0.86175 -0.25562 -2.79859

Using double bracketswill retain the first index level.

使用双括号将保留第一个索引级别。

回答by FabienP

You can use:

您可以使用:

df.iloc[[6, 7], :]
Out[1]:
                     0         1         2         3
first second
qux   one    -1.253881 -0.637752  0.907105 -1.428681
      two    -0.140069 -0.861755 -0.255619 -2.798589

Where [6, 7]correspond to the actual row indexes of these lines, as you can see below:

其中[6, 7]对应于这些线路的实际行的索引,你可以看到如下:

df.reset_index()
Out[]:
  first second         0         1         2         3
0   bar    one -1.085631  0.997345  0.282978 -1.506295
1   bar    two -0.578600  1.651437 -2.426679 -0.428913
2   baz    one  1.265936 -0.866740 -0.678886 -0.094709
3   baz    two  1.491390 -0.638902 -0.443982 -0.434351
4   foo    one  2.205930  2.186786  1.004054  0.386186
5   foo    two  0.737369  1.490732 -0.935834  1.175829
6   qux    one -1.253881 -0.637752  0.907105 -1.428681
7   qux    two -0.140069 -0.861755 -0.255619 -2.798589

This also works with df.iloc[[-2, -1], :]or df.iloc[range(-2, 0), :].

这也适用于df.iloc[[-2, -1], :]df.iloc[range(-2, 0), :]



EDIT: Turning it into a more generic solution

编辑:把它变成一个更通用的解决方案

Then it is possible to get a generic function:

那么就有可能得到一个泛型函数:

def multindex_iloc(df, index):
    label = df.index.levels[0][index]
    return df.iloc[df.index.get_loc(label)]

multiindex_loc(df, -1)
Out[]:
                     0         1         2         3
first second
qux   one    -1.253881 -0.637752  0.907105 -1.428681
      two    -0.140069 -0.861755 -0.255619 -2.798589


multiindex_loc(df, 2)
Out[]:
                     0         1         2         3
first second
foo   one     2.205930  2.186786  1.004054  0.386186
      two     0.737369  1.490732 -0.935834  1.175829

回答by H?ken Lid

You can use the swaplevelmethod to reorder the index before using loc.

您可以使用swaplevel方法在使用之前对索引重新排序loc

df.swaplevel(0,-1).loc['two']

With the sample data from your question, it looks like this:

使用您问题中的示例数据,它看起来像这样:

>>> df
                     0         1         2         3
first second                                        
bar   one    -1.085631  0.997345  0.282978 -1.506295
      two    -0.578600  1.651437 -2.426679 -0.428913
baz   one     1.265936 -0.866740 -0.678886 -0.094709
      two     1.491390 -0.638902 -0.443982 -0.434351
foo   one     2.205930  2.186786  1.004054  0.386186
      two     0.737369  1.490732 -0.935834  1.175829
qux   one    -1.253881 -0.637752  0.907105 -1.428681
      two    -0.140069 -0.861755 -0.255619 -2.798589

>>> df.loc['bar']
               0         1         2         3
second                                        
one    -1.085631  0.997345  0.282978 -1.506295
two    -0.578600  1.651437 -2.426679 -0.428913

>>> df.swaplevel().loc['two']
              0         1         2         3
first                                        
bar   -0.578600  1.651437 -2.426679 -0.428913
baz    1.491390 -0.638902 -0.443982 -0.434351
foo    0.737369  1.490732 -0.935834  1.175829
qux   -0.140069 -0.861755 -0.255619 -2.798589

swaplevelis a MultiIndex method, but you can call it directly on the DataFrame. The default is to swap the inner two levels, so if you have more than two levels in the multi index, you should explicitly state the level you want to swap.

swaplevel是一个 MultiIndex 方法,但您可以直接在 DataFrame 上调用它。默认是交换内部的两个级别,因此如果多索引中有两个以上的级别,则应明确说明要交换的级别。

df.swaplevel(0,-1).loc['two']