pandas 如何使用另一个 MultiIndex 切片一个 MultiIndex DataFrame

Question

提问by choldgraf

I have a pandas dataframe with 3 levels of a MultiIndex. I am trying to pull out rows of this dataframe according to a list of values that correspond to two of the levels.

我有一个带有 3 个多索引级别的 Pandas 数据框。我正在尝试根据对应于两个级别的值列表提取此数据帧的行。

I have something like this:

我有这样的事情：

ix = pd.MultiIndex.from_product([[1, 2, 3], ['foo', 'bar'], ['baz', 'can']], names=['a', 'b', 'c'])
data = np.arange(len(ix))
df = pd.DataFrame(data, index=ix, columns=['hi'])
print(df)

           hi
a b   c      
1 foo baz   0
      can   1
  bar baz   2
      can   3
2 foo baz   4
      can   5
  bar baz   6
      can   7
3 foo baz   8
      can   9
  bar baz  10
      can  11

Now I want to take all rows where index levels 'b' and 'c' are in this index:

现在我想获取索引级别 'b' 和 'c' 在该索引中的所有行：

ix_use = pd.MultiIndex.from_tuples([('foo', 'can'), ('bar', 'baz')], names=['b', 'c'])

i.e. values of hihaving ('foo', 'can')or ('bar', 'baz')in levels band crespectively: (1, 2, 5, 6, 9, 10).

即hi具有('foo', 'can')或('bar', 'baz')在水平b和c分别的值：(1, 2, 5, 6, 9, 10)。

So I'd like to take a slice(None)on the first level, and pull out specific tuples on the second and third levels.

所以我想slice(None)在第一层取a ，在第二层和第三层拉出特定的元组。

Initially I thought that passing a multi-index object to .loc would pull out the values / levels that I wanted, but this isn't working. What's the best way to do something like this?

最初我认为将多索引对象传递给 .loc 会提取出我想要的值/级别，但这不起作用。做这样的事情的最好方法是什么？

Answer 1

回答by Primer

Here is a way to get this slice:

这是获取此切片的方法：

df.sort_index(inplace=True)
idx = pd.IndexSlice
df.loc[idx[:, ('foo','bar'), 'can'], :]

yielding

屈服

           hi
a b   c      
1 bar can   3
  foo can   1
2 bar can   7
  foo can   5
3 bar can  11
  foo can   9

Note that you might need to sort MultiIndex before you can slice it. Well pandas is kind enough to warn if you need to do it:

请注意，您可能需要先对 MultiIndex 进行排序，然后才能对其进行切片。好吧，如果您需要这样做，pandas 会发出警告：

KeyError: 'MultiIndex Slicing requires the index to be fully lexsorted tuple len (3), lexsort depth (1)'

You can read more on how to use slicers in the docs

您可以在文档中阅读有关如何使用切片器的更多信息

If for some reason using slicers is not an option here is a way to get the same slice using .isin()method:

如果由于某种原因不能使用切片器，这里是一种使用.isin()方法获取相同切片的方法：

df[df.index.get_level_values('b').isin(ix_use.get_level_values(0)) & df.index.get_level_values('c').isin(ix_use.get_level_values(1))]

Which is clearly not as concise.

这显然没有那么简洁。

UPDATE:

更新：

For the conditions that you have updated here is a way to do it:

对于您在此处更新的条件是一种方法：

cond1 = (df.index.get_level_values('b').isin(['foo'])) & (df.index.get_level_values('c').isin(['can']))
cond2 = (df.index.get_level_values('b').isin(['bar'])) & (df.index.get_level_values('c').isin(['baz']))
df[cond1 | cond2]

producing:

生产：

           hi
a b   c      
1 foo can   1
  bar baz   2
2 foo can   5
  bar baz   6
3 foo can   9
  bar baz  10

Answer 2

回答by YaOzI

I would recommend the query()methodjust like in this Q&A.

我会推荐这种query()方法，就像在这个问答中一样。

Simply using this, which I think is a more natural way to express:

简单地使用这个，我认为这是一种更自然的表达方式：

In [27]: df.query("(b == 'foo' and c == 'can') or (b == 'bar' and c == 'baz')")
Out[27]: 
           hi
a b   c      
1 foo can   1
  bar baz   2
2 foo can   5
  bar baz   6
3 foo can   9
  bar baz  10

Answer 3

回答by LondonRob

I find it interesting that this doesn't work:

我觉得这行不通很有趣：

In [45]: df.loc[(idx[:, 'foo', 'can'], idx[:, 'bar', 'baz']), ]
Out[45]: 
           hi
a b   c      
1 bar baz   2
      can   3
  foo baz   0
      can   1
2 bar baz   6
      can   7
  foo baz   4
      can   5
3 bar baz  10
      can  11
  foo baz   8
      can   9

It sort of looks like it "should", somehow. In any case, here's a reasonable workaround:

不知何故，它看起来“应该”。无论如何，这是一个合理的解决方法：

Let's assume the tuples you want to slice by are in the index of another DataFrame(since it sounds like they probably arein your case!).

让我们假设您想要切片的元组在另一个的索引中DataFrame（因为听起来它们可能是您的情况！）。

In [53]: ix_use = pd.MultiIndex.from_tuples([('foo', 'can'), ('bar', 'baz')], names=['b', 'c'])
In [55]: other = pd.DataFrame(dict(a=1), index=ix_use)
In [56]: other
Out[56]: 
         a
b   c     
foo can  1
bar baz  1

Now to slice dfby the index of otherwe can use the fact that .loc/.ixallow you to give a list of tuples (see the last example here).

现在要df根据的索引进行切片，other我们可以使用.loc/.ix允许您提供元组列表的事实（请参阅此处的最后一个示例）。

First let's build the list of tuples we want:

首先让我们构建我们想要的元组列表：

In [13]: idx = [(x, ) + y for x in df.index.levels[0] for y in other.index.values]
In [14]: idx
Out[14]: 
[(1, 'foo', 'can'),
 (1, 'bar', 'baz'),
 (2, 'foo', 'can'),
 (2, 'bar', 'baz'),
 (3, 'foo', 'can'),
 (3, 'bar', 'baz')]

Now we can pass this list to .ixor .loc:

现在我们可以将此列表传递给.ixor .loc：

In [17]: df.ix[idx]
Out[17]: 
           hi
a b   c      
1 foo can   1
  bar baz   2
2 foo can   5
  bar baz   6
3 foo can   9
  bar baz  10

pandas 如何使用另一个 MultiIndex 切片一个 MultiIndex DataFrame

提问by choldgraf

回答by Primer

回答by YaOzI

回答by LondonRob

相关推荐

最近更新

标签

pandas 如何使用另一个 MultiIndex 切片一个 MultiIndex DataFrame

提问by choldgraf

回答by Primer

回答by YaOzI

回答by LondonRob

相关推荐

尽管数据已填充数据，pandas df.corr() 仍返回 NaN

pandas AttributeError: 'module' 对象没有属性 'hist'

pandas python 应用函数列出并返回数据框

pandas python通过列表创建一个带有一行的数据框

相关推荐

最近更新

标签