在 Pandas 中循环使用 MultiIndex

Question

提问by Charles Dillon

I have a MultiIndexed DataFrame df1, and would like to loop over it in such a way as to in each instance of the loop have a DataFrame with a regular non-hierarchical index which is the subset of df1 corresponding to the outer index entries. I.e., if i have:

我有一个 MultiIndexed DataFrame df1，并希望以这样的方式循环遍历它，以便在循环的每个实例中都有一个带有常规非分层索引的 DataFrame，它是与外部索引条目相对应的 df1 的子集。即，如果我有：

FirstTable

第一表

I want to get

我想得到

SecondTable

第二表

and subsequently C1, C2, etc. I also don't know what the names of these will actually be (C1, etc., just being placeholders here), so would just like to loop over the number of C_ivalues I have.

以及随后的 C1、C2 等。我也不知道这些名称的实际名称是什么（C1 等，这里只是占位符），所以只想遍历我拥有的 C _i值的数量。

I have been stumbling around with iterrowsand various loops and not getting any tangible results and don't really know how to proceed. I feel like a simple solution should exist but couldn't find anything that looked helpful in the documentation, probably due to my own lack of understanding.

我一直在iterrows各种循环中磕磕绊绊，没有得到任何切实的结果，也不知道如何继续。我觉得应该存在一个简单的解决方案，但在文档中找不到任何看起来有用的东西，可能是由于我自己缺乏理解。

Answer 1

回答by Jeff

Using a modified example from here

使用此处的修改示例

In [30]: def mklbl(prefix,n):
        return ["%s%s" % (prefix,i)  for i in range(n)]
   ....: 

In [31]: columns = MultiIndex.from_tuples([('a','foo'),('a','bar'),
                                  ('b','foo'),('b','bah')],
                                   names=['lvl0', 'lvl1'])

In [33]: index = MultiIndex.from_product([mklbl('A',4),mklbl('B',2)])

In [34]: df = DataFrame(np.arange(len(index)*len(columns)).reshape((len(index),len(columns))),
               index=index,
               columns=columns).sortlevel().sortlevel(axis=1)

In [35]: df
Out[35]: 
lvl0     a         b     
lvl1   bar  foo  bah  foo
A0 B0    1    0    3    2
   B1    5    4    7    6
A1 B0    9    8   11   10
   B1   13   12   15   14
A2 B0   17   16   19   18
   B1   21   20   23   22
A3 B0   25   24   27   26
   B1   29   28   31   30

In [36]: df.loc['A0']
Out[36]: 
lvl0    a         b     
lvl1  bar  foo  bah  foo
B0      1    0    3    2
B1      5    4    7    6

In [37]: df.loc['A1']
Out[37]: 
lvl0    a         b     
lvl1  bar  foo  bah  foo
B0      9    8   11   10
B1     13   12   15   14

No looping is necessary.

不需要循环。

You can also select these in order to return a frame (with the original MI) e.g. df.loc[['A1']]

您也可以选择这些以返回一个框架（带有原始 MI），例如 df.loc[['A1']]

If you want to get the values in the index:

如果要获取索引中的值：

In [38]: df.index.get_level_values(0).unique()
Out[38]: array(['A0', 'A1', 'A2', 'A3'], dtype=object)

Answer 2

回答by John

Are you trying to do something like this?

你想做这样的事情吗？

for i in set(df.index):
    print df.loc[i].reset_index()

set(df.index)returns a set of unique tuples of your multi-index (hierarchical index).
df.loc[i].reset_index()... df.loc[i]of course returns a subset of your original dataframe, and the .reset_index()part will convert the index to columns

set(df.index)返回多索引（分层索引）的一组唯一元组。
df.loc[i].reset_index()...df.loc[i]当然会返回原始数据帧的子集，该.reset_index()部分会将索引转换为列

在 Pandas 中循环使用 MultiIndex

提问by Charles Dillon

回答by Jeff

回答by John

相关推荐

最近更新

标签

在 Pandas 中循环使用 MultiIndex

提问by Charles Dillon

回答by Jeff

回答by John

相关推荐

从 Pandas 数据框中删除 NaT 值

Python pandas：如何按组运行多个单变量回归

pandas 熊猫无法从大型 StringIO 对象中读取

pandas 如何去掉一列中的日期信息，只保留时间

相关推荐

最近更新

标签