Python 如何使用索引迭代熊猫多索引数据帧

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/25929319/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 23:49:48  来源:igfitidea点击:

How to iterate over pandas multiindex dataframe using index

pythonpandas

提问by Yantraguru

I have a data frame df which looks like this. Date and Time are 2 multilevel index

我有一个数据框 df 看起来像这样。日期和时间是 2 多级索引

                           observation1   observation2
date          Time                             
2012-11-02    9:15:00      79.373668      224
              9:16:00      130.841316     477
2012-11-03    9:15:00      45.312814      835
              9:16:00      123.776946     623
              9:17:00      153.76646      624
              9:18:00      463.276946     626
              9:19:00      663.176934     622
              9:20:00      763.77333      621
2012-11-04    9:15:00      115.449437     122
              9:16:00      123.776946     555
              9:17:00      153.76646      344
              9:18:00      463.276946     212

I want to have do some complex process over daily data block.

我想对日常数据块做一些复杂的处理。

Psuedo code would look like

伪代码看起来像

 for count in df(level 0 index) :
     new_df = get only chunk for count
     complex_process(new_df)

So, first of all, I could not find a way to access only blocks for a date

所以,首先,我找不到只访问日期块的方法

2012-11-03    9:15:00      45.312814      835
              9:16:00      123.776946     623
              9:17:00      153.76646      624
              9:18:00      463.276946     626
              9:19:00      663.176934     622
              9:20:00      763.77333      621

and then send it for processing. I am doing this in for loop as I am not sure if there is any way to do it without mentioning exact value of level 0 column. I did some basic search and able to get df.index.get_level_values(0), but it returns me all the values and that causes loop to run multiple times for a day. I want to create a dataframe per day and send it for processing.

然后送去处理。我在 for 循环中执行此操作,因为我不确定是否有任何方法可以在不提及级别 0 列的确切值的情况下执行此操作。我做了一些基本的搜索并能够获得 df.index.get_level_values(0),但它返回了我所有的值,这导致循环在一天内运行多次。我想每天创建一个数据帧并将其发送以进行处理。

采纳答案by chrisb

One easy way would be to groupby the first level of the index - iterating over the groupby object will return the group keys and a subframe containing each group.

一种简单的方法是对索引的第一级进行分组 - 迭代 groupby 对象将返回组键和包含每个组的子帧。

In [136]: for date, new_df in df.groupby(level=0):
     ...:     print(new_df)
     ...:     
                    observation1  observation2
date       Time                               
2012-11-02 9:15:00     79.373668           224
           9:16:00    130.841316           477

                    observation1  observation2
date       Time                               
2012-11-03 9:15:00     45.312814           835
           9:16:00    123.776946           623
           9:17:00    153.766460           624
           9:18:00    463.276946           626
           9:19:00    663.176934           622
           9:20:00    763.773330           621

                    observation1  observation2
date       Time                               
2012-11-04 9:15:00    115.449437           122
           9:16:00    123.776946           555
           9:17:00    153.766460           344
           9:18:00    463.276946           212

回答by psorenson

What about this?

那这个呢?

for idate in df.index.get_level_values('date'):
    complex_process(df.ix[idate], idate)

回答by melbay

Tagging off of @psorenson answer, we can get unique level indices and its related data frame slices without numpy as follows:

标记@psorenson 答案,我们可以获得唯一级别索引及其相关数据帧切片,无需 numpy,如下所示:

for date in df.index.get_level_values('date').unique():
    print(df.loc[date])