在 Pandas 中循环使用 MultiIndex
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/24807588/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Looping over a MultiIndex in pandas
提问by Charles Dillon
I have a MultiIndexed DataFrame df1, and would like to loop over it in such a way as to in each instance of the loop have a DataFrame with a regular non-hierarchical index which is the subset of df1 corresponding to the outer index entries. I.e., if i have:
我有一个 MultiIndexed DataFrame df1,并希望以这样的方式循环遍历它,以便在循环的每个实例中都有一个带有常规非分层索引的 DataFrame,它是与外部索引条目相对应的 df1 的子集。即,如果我有:


I want to get
我想得到


and subsequently C1, C2, etc. I also don't know what the names of these will actually be (C1, etc., just being placeholders here), so would just like to loop over the number of Civalues I have.
以及随后的 C1、C2 等。我也不知道这些名称的实际名称是什么(C1 等,这里只是占位符),所以只想遍历我拥有的 C i值的数量。
I have been stumbling around with iterrowsand various loops and not getting any tangible results and don't really know how to proceed. I feel like a simple solution should exist but couldn't find anything that looked helpful in the documentation, probably due to my own lack of understanding.
我一直在iterrows各种循环中磕磕绊绊,没有得到任何切实的结果,也不知道如何继续。我觉得应该存在一个简单的解决方案,但在文档中找不到任何看起来有用的东西,可能是由于我自己缺乏理解。
回答by Jeff
Using a modified example from here
使用此处的修改示例
In [30]: def mklbl(prefix,n):
return ["%s%s" % (prefix,i) for i in range(n)]
....:
In [31]: columns = MultiIndex.from_tuples([('a','foo'),('a','bar'),
('b','foo'),('b','bah')],
names=['lvl0', 'lvl1'])
In [33]: index = MultiIndex.from_product([mklbl('A',4),mklbl('B',2)])
In [34]: df = DataFrame(np.arange(len(index)*len(columns)).reshape((len(index),len(columns))),
index=index,
columns=columns).sortlevel().sortlevel(axis=1)
In [35]: df
Out[35]:
lvl0 a b
lvl1 bar foo bah foo
A0 B0 1 0 3 2
B1 5 4 7 6
A1 B0 9 8 11 10
B1 13 12 15 14
A2 B0 17 16 19 18
B1 21 20 23 22
A3 B0 25 24 27 26
B1 29 28 31 30
In [36]: df.loc['A0']
Out[36]:
lvl0 a b
lvl1 bar foo bah foo
B0 1 0 3 2
B1 5 4 7 6
In [37]: df.loc['A1']
Out[37]:
lvl0 a b
lvl1 bar foo bah foo
B0 9 8 11 10
B1 13 12 15 14
No looping is necessary.
不需要循环。
You can also select these in order to return a frame (with the original MI)
e.g. df.loc[['A1']]
您也可以选择这些以返回一个框架(带有原始 MI),例如 df.loc[['A1']]
If you want to get the values in the index:
如果要获取索引中的值:
In [38]: df.index.get_level_values(0).unique()
Out[38]: array(['A0', 'A1', 'A2', 'A3'], dtype=object)
回答by John
Are you trying to do something like this?
你想做这样的事情吗?
for i in set(df.index):
print df.loc[i].reset_index()
set(df.index)returns a set of unique tuples of your multi-index (hierarchical index).df.loc[i].reset_index()...df.loc[i]of course returns a subset of your original dataframe, and the.reset_index()part will convert the index to columns
set(df.index)返回多索引(分层索引)的一组唯一元组。df.loc[i].reset_index()...df.loc[i]当然会返回原始数据帧的子集,该.reset_index()部分会将索引转换为列

