在 Pandas 中循环使用 MultiIndex

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/24807588/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 22:16:19  来源:igfitidea点击:

Looping over a MultiIndex in pandas

pythonpandasmulti-index

提问by Charles Dillon

I have a MultiIndexed DataFrame df1, and would like to loop over it in such a way as to in each instance of the loop have a DataFrame with a regular non-hierarchical index which is the subset of df1 corresponding to the outer index entries. I.e., if i have:

我有一个 MultiIndexed DataFrame df1,并希望以这样的方式循环遍历它,以便在循环的每个实例中都有一个带有常规非分层索引的 DataFrame,它是与外部索引条目相对应的 df1 的子集。即,如果我有:

FirstTable

第一表

I want to get

我想得到

SecondTable

第二表

and subsequently C1, C2, etc. I also don't know what the names of these will actually be (C1, etc., just being placeholders here), so would just like to loop over the number of Civalues I have.

以及随后的 C1、C2 等。我也不知道这些名称的实际名称是什么(C1 等,这里只是占位符),所以只想遍历我拥有的 C i值的数量。

I have been stumbling around with iterrowsand various loops and not getting any tangible results and don't really know how to proceed. I feel like a simple solution should exist but couldn't find anything that looked helpful in the documentation, probably due to my own lack of understanding.

我一直在iterrows各种循环中磕磕绊绊,没有得到任何切实的结果,也不知道如何继续。我觉得应该存在一个简单的解决方案,但在文档中找不到任何看起来有用的东西,可能是由于我自己缺乏理解。

回答by Jeff

Using a modified example from here

使用此处的修改示例

In [30]: def mklbl(prefix,n):
        return ["%s%s" % (prefix,i)  for i in range(n)]
   ....: 

In [31]: columns = MultiIndex.from_tuples([('a','foo'),('a','bar'),
                                  ('b','foo'),('b','bah')],
                                   names=['lvl0', 'lvl1'])

In [33]: index = MultiIndex.from_product([mklbl('A',4),mklbl('B',2)])

In [34]: df = DataFrame(np.arange(len(index)*len(columns)).reshape((len(index),len(columns))),
               index=index,
               columns=columns).sortlevel().sortlevel(axis=1)

In [35]: df
Out[35]: 
lvl0     a         b     
lvl1   bar  foo  bah  foo
A0 B0    1    0    3    2
   B1    5    4    7    6
A1 B0    9    8   11   10
   B1   13   12   15   14
A2 B0   17   16   19   18
   B1   21   20   23   22
A3 B0   25   24   27   26
   B1   29   28   31   30

In [36]: df.loc['A0']
Out[36]: 
lvl0    a         b     
lvl1  bar  foo  bah  foo
B0      1    0    3    2
B1      5    4    7    6

In [37]: df.loc['A1']
Out[37]: 
lvl0    a         b     
lvl1  bar  foo  bah  foo
B0      9    8   11   10
B1     13   12   15   14

No looping is necessary.

不需要循环。

You can also select these in order to return a frame (with the original MI) e.g. df.loc[['A1']]

您也可以选择这些以返回一个框架(带有原始 MI),例如 df.loc[['A1']]

If you want to get the values in the index:

如果要获取索引中的值:

In [38]: df.index.get_level_values(0).unique()
Out[38]: array(['A0', 'A1', 'A2', 'A3'], dtype=object)

回答by John

Are you trying to do something like this?

你想做这样的事情吗?

for i in set(df.index):
    print df.loc[i].reset_index()
  1. set(df.index)returns a set of unique tuples of your multi-index (hierarchical index).
  2. df.loc[i].reset_index()... df.loc[i]of course returns a subset of your original dataframe, and the .reset_index()part will convert the index to columns
  1. set(df.index)返回多索引(分层索引)的一组唯一元组。
  2. df.loc[i].reset_index()...df.loc[i]当然会返回原始数据帧的子集,该.reset_index()部分会将索引转换为列