pandas `.loc` 和 `.iloc` 带有 MultiIndex'd DataFrame
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/45967702/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
`.loc` and `.iloc` with MultiIndex'd DataFrame
提问by Brad Solomon
When indexing a MultiIndex-ed DataFrame, it seems like .iloc
assumes you're referencing the "inner level" of the index while .loc
looks at the outer level.
在索引 MultiIndex-ed DataFrame 时,似乎.iloc
假设您正在引用索引的“内部级别”,同时.loc
查看外部级别。
For example:
例如:
np.random.seed(123)
iterables = [['bar', 'baz', 'foo', 'qux'], ['one', 'two']]
idx = pd.MultiIndex.from_product(iterables, names=['first', 'second'])
df = pd.DataFrame(np.random.randn(8, 4), index=idx)
# .loc looks at the outer index:
print(df.loc['qux'])
# df.loc['two'] would throw KeyError
0 1 2 3
second
one -1.25388 -0.63775 0.90711 -1.42868
two -0.14007 -0.86175 -0.25562 -2.79859
# while .iloc looks at the inner index:
print(df.iloc[-1])
0 -0.14007
1 -0.86175
2 -0.25562
3 -2.79859
Name: (qux, two), dtype: float64
Two questions:
两个问题:
Firstly, why is this? Is it a deliberate design decision?
首先,这是为什么?这是一个深思熟虑的设计决定吗?
Secondly, can I use .iloc
to reference the outer level of the index, to yield the result below? I'm aware I could first find the last member of the index with get_level_values
and then .loc
-index with that, but wandering if it can be done more directly, either with funky .iloc
syntax or some existing function designed specifically for the case.
其次,我可以使用.iloc
引用索引的外层,以产生下面的结果吗?我知道我可以先找到索引的最后一个成员,get_level_values
然后使用.loc
-index 找到它,但是如果可以更直接地完成它,无论是使用时髦的.iloc
语法还是一些专门为这种情况设计的现有函数,我都会犹豫不决。
# df.iloc[-1]
qux one 0.89071 1.75489 1.49564 1.06939
two -0.77271 0.79486 0.31427 -1.32627
采纳答案by Brad Solomon
Yes, this is a deliberate design decision:
是的,这是一个深思熟虑的设计决定:
.iloc
is a strict positional indexer, it does notregard the structure at all, only the first actual behavior. ....loc
doestake into account the level behavior. [emphasis added]
.iloc
是一个严格的位置索引器,它根本不考虑结构,只考虑第一个实际行为。....loc
确实考虑到了关卡行为。【强调】
So the desired result given in the question is not possible in a flexible manner with .iloc
. The closest workaround, used in several similar questions, is
因此,问题中给出的预期结果不可能以灵活的方式与.iloc
. 在几个类似问题中使用的最接近的解决方法是
print(df.loc[[df.index.get_level_values(0)[-1]]])
0 1 2 3
first second
qux one -1.25388 -0.63775 0.90711 -1.42868
two -0.14007 -0.86175 -0.25562 -2.79859
Using double bracketswill retain the first index level.
使用双括号将保留第一个索引级别。
回答by FabienP
You can use:
您可以使用:
df.iloc[[6, 7], :]
Out[1]:
0 1 2 3
first second
qux one -1.253881 -0.637752 0.907105 -1.428681
two -0.140069 -0.861755 -0.255619 -2.798589
Where [6, 7]
correspond to the actual row indexes of these lines, as you can see below:
其中[6, 7]
对应于这些线路的实际行的索引,你可以看到如下:
df.reset_index()
Out[]:
first second 0 1 2 3
0 bar one -1.085631 0.997345 0.282978 -1.506295
1 bar two -0.578600 1.651437 -2.426679 -0.428913
2 baz one 1.265936 -0.866740 -0.678886 -0.094709
3 baz two 1.491390 -0.638902 -0.443982 -0.434351
4 foo one 2.205930 2.186786 1.004054 0.386186
5 foo two 0.737369 1.490732 -0.935834 1.175829
6 qux one -1.253881 -0.637752 0.907105 -1.428681
7 qux two -0.140069 -0.861755 -0.255619 -2.798589
This also works with df.iloc[[-2, -1], :]
or df.iloc[range(-2, 0), :]
.
这也适用于df.iloc[[-2, -1], :]
或df.iloc[range(-2, 0), :]
。
EDIT: Turning it into a more generic solution
编辑:把它变成一个更通用的解决方案
Then it is possible to get a generic function:
那么就有可能得到一个泛型函数:
def multindex_iloc(df, index):
label = df.index.levels[0][index]
return df.iloc[df.index.get_loc(label)]
multiindex_loc(df, -1)
Out[]:
0 1 2 3
first second
qux one -1.253881 -0.637752 0.907105 -1.428681
two -0.140069 -0.861755 -0.255619 -2.798589
multiindex_loc(df, 2)
Out[]:
0 1 2 3
first second
foo one 2.205930 2.186786 1.004054 0.386186
two 0.737369 1.490732 -0.935834 1.175829
回答by H?ken Lid
You can use the swaplevel
method to reorder the index before using loc
.
您可以使用swaplevel
方法在使用之前对索引重新排序loc
。
df.swaplevel(0,-1).loc['two']
With the sample data from your question, it looks like this:
使用您问题中的示例数据,它看起来像这样:
>>> df
0 1 2 3
first second
bar one -1.085631 0.997345 0.282978 -1.506295
two -0.578600 1.651437 -2.426679 -0.428913
baz one 1.265936 -0.866740 -0.678886 -0.094709
two 1.491390 -0.638902 -0.443982 -0.434351
foo one 2.205930 2.186786 1.004054 0.386186
two 0.737369 1.490732 -0.935834 1.175829
qux one -1.253881 -0.637752 0.907105 -1.428681
two -0.140069 -0.861755 -0.255619 -2.798589
>>> df.loc['bar']
0 1 2 3
second
one -1.085631 0.997345 0.282978 -1.506295
two -0.578600 1.651437 -2.426679 -0.428913
>>> df.swaplevel().loc['two']
0 1 2 3
first
bar -0.578600 1.651437 -2.426679 -0.428913
baz 1.491390 -0.638902 -0.443982 -0.434351
foo 0.737369 1.490732 -0.935834 1.175829
qux -0.140069 -0.861755 -0.255619 -2.798589
swaplevel
is a MultiIndex method, but you can call it directly on the DataFrame.
The default is to swap the inner two levels, so if you have more than two levels in the multi index, you should explicitly state the level you want to swap.
swaplevel
是一个 MultiIndex 方法,但您可以直接在 DataFrame 上调用它。默认是交换内部的两个级别,因此如果多索引中有两个以上的级别,则应明确说明要交换的级别。
df.swaplevel(0,-1).loc['two']