pandas 如何使用另一个 MultiIndex 切片一个 MultiIndex DataFrame
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/29266600/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to slice one MultiIndex DataFrame with the MultiIndex of another
提问by choldgraf
I have a pandas dataframe with 3 levels of a MultiIndex. I am trying to pull out rows of this dataframe according to a list of values that correspond to two of the levels.
我有一个带有 3 个多索引级别的 Pandas 数据框。我正在尝试根据对应于两个级别的值列表提取此数据帧的行。
I have something like this:
我有这样的事情:
ix = pd.MultiIndex.from_product([[1, 2, 3], ['foo', 'bar'], ['baz', 'can']], names=['a', 'b', 'c'])
data = np.arange(len(ix))
df = pd.DataFrame(data, index=ix, columns=['hi'])
print(df)
hi
a b c
1 foo baz 0
can 1
bar baz 2
can 3
2 foo baz 4
can 5
bar baz 6
can 7
3 foo baz 8
can 9
bar baz 10
can 11
Now I want to take all rows where index levels 'b' and 'c' are in this index:
现在我想获取索引级别 'b' 和 'c' 在该索引中的所有行:
ix_use = pd.MultiIndex.from_tuples([('foo', 'can'), ('bar', 'baz')], names=['b', 'c'])
i.e. values of hihaving ('foo', 'can')or ('bar', 'baz')in levels band crespectively: (1, 2, 5, 6, 9, 10).
即hi具有('foo', 'can')或('bar', 'baz')在水平b和c分别的值:(1, 2, 5, 6, 9, 10)。
So I'd like to take a slice(None)on the first level, and pull out specific tuples on the second and third levels.
所以我想slice(None)在第一层取a ,在第二层和第三层拉出特定的元组。
Initially I thought that passing a multi-index object to .loc would pull out the values / levels that I wanted, but this isn't working. What's the best way to do something like this?
最初我认为将多索引对象传递给 .loc 会提取出我想要的值/级别,但这不起作用。做这样的事情的最好方法是什么?
回答by Primer
Here is a way to get this slice:
这是获取此切片的方法:
df.sort_index(inplace=True)
idx = pd.IndexSlice
df.loc[idx[:, ('foo','bar'), 'can'], :]
yielding
屈服
hi
a b c
1 bar can 3
foo can 1
2 bar can 7
foo can 5
3 bar can 11
foo can 9
Note that you might need to sort MultiIndex before you can slice it. Well pandas is kind enough to warn if you need to do it:
请注意,您可能需要先对 MultiIndex 进行排序,然后才能对其进行切片。好吧,如果您需要这样做,pandas 会发出警告:
KeyError: 'MultiIndex Slicing requires the index to be fully lexsorted tuple len (3), lexsort depth (1)'
You can read more on how to use slicers in the docs
您可以在文档中阅读有关如何使用切片器的更多信息
If for some reason using slicers is not an option here is a way to get the same slice using .isin()method:
如果由于某种原因不能使用切片器,这里是一种使用.isin()方法获取相同切片的方法:
df[df.index.get_level_values('b').isin(ix_use.get_level_values(0)) & df.index.get_level_values('c').isin(ix_use.get_level_values(1))]
Which is clearly not as concise.
这显然没有那么简洁。
UPDATE:
更新:
For the conditions that you have updated here is a way to do it:
对于您在此处更新的条件是一种方法:
cond1 = (df.index.get_level_values('b').isin(['foo'])) & (df.index.get_level_values('c').isin(['can']))
cond2 = (df.index.get_level_values('b').isin(['bar'])) & (df.index.get_level_values('c').isin(['baz']))
df[cond1 | cond2]
producing:
生产:
hi
a b c
1 foo can 1
bar baz 2
2 foo can 5
bar baz 6
3 foo can 9
bar baz 10
回答by YaOzI
I would recommend the query()methodjust like in this Q&A.
我会推荐这种query()方法,就像在这个问答中一样。
Simply using this, which I think is a more natural way to express:
简单地使用这个,我认为这是一种更自然的表达方式:
In [27]: df.query("(b == 'foo' and c == 'can') or (b == 'bar' and c == 'baz')")
Out[27]:
hi
a b c
1 foo can 1
bar baz 2
2 foo can 5
bar baz 6
3 foo can 9
bar baz 10
回答by LondonRob
I find it interesting that this doesn't work:
我觉得这行不通很有趣:
In [45]: df.loc[(idx[:, 'foo', 'can'], idx[:, 'bar', 'baz']), ]
Out[45]:
hi
a b c
1 bar baz 2
can 3
foo baz 0
can 1
2 bar baz 6
can 7
foo baz 4
can 5
3 bar baz 10
can 11
foo baz 8
can 9
It sort of looks like it "should", somehow. In any case, here's a reasonable workaround:
不知何故,它看起来“应该”。无论如何,这是一个合理的解决方法:
Let's assume the tuples you want to slice by are in the index of another DataFrame(since it sounds like they probably arein your case!).
让我们假设您想要切片的元组在另一个的索引中DataFrame(因为听起来它们可能是您的情况!)。
In [53]: ix_use = pd.MultiIndex.from_tuples([('foo', 'can'), ('bar', 'baz')], names=['b', 'c'])
In [55]: other = pd.DataFrame(dict(a=1), index=ix_use)
In [56]: other
Out[56]:
a
b c
foo can 1
bar baz 1
Now to slice dfby the index of otherwe can use the fact that .loc/.ixallow you to give a list of tuples (see the last example here).
现在要df根据 的索引进行切片,other我们可以使用.loc/.ix允许您提供元组列表的事实(请参阅此处的最后一个示例)。
First let's build the list of tuples we want:
首先让我们构建我们想要的元组列表:
In [13]: idx = [(x, ) + y for x in df.index.levels[0] for y in other.index.values]
In [14]: idx
Out[14]:
[(1, 'foo', 'can'),
(1, 'bar', 'baz'),
(2, 'foo', 'can'),
(2, 'bar', 'baz'),
(3, 'foo', 'can'),
(3, 'bar', 'baz')]
Now we can pass this list to .ixor .loc:
现在我们可以将此列表传递给.ixor .loc:
In [17]: df.ix[idx]
Out[17]:
hi
a b c
1 foo can 1
bar baz 2
2 foo can 5
bar baz 6
3 foo can 9
bar baz 10

