Python 如何使用索引迭代熊猫多索引数据帧
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/25929319/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to iterate over pandas multiindex dataframe using index
提问by Yantraguru
I have a data frame df which looks like this. Date and Time are 2 multilevel index
我有一个数据框 df 看起来像这样。日期和时间是 2 多级索引
observation1 observation2
date Time
2012-11-02 9:15:00 79.373668 224
9:16:00 130.841316 477
2012-11-03 9:15:00 45.312814 835
9:16:00 123.776946 623
9:17:00 153.76646 624
9:18:00 463.276946 626
9:19:00 663.176934 622
9:20:00 763.77333 621
2012-11-04 9:15:00 115.449437 122
9:16:00 123.776946 555
9:17:00 153.76646 344
9:18:00 463.276946 212
I want to have do some complex process over daily data block.
我想对日常数据块做一些复杂的处理。
Psuedo code would look like
伪代码看起来像
for count in df(level 0 index) :
new_df = get only chunk for count
complex_process(new_df)
So, first of all, I could not find a way to access only blocks for a date
所以,首先,我找不到只访问日期块的方法
2012-11-03 9:15:00 45.312814 835
9:16:00 123.776946 623
9:17:00 153.76646 624
9:18:00 463.276946 626
9:19:00 663.176934 622
9:20:00 763.77333 621
and then send it for processing. I am doing this in for loop as I am not sure if there is any way to do it without mentioning exact value of level 0 column. I did some basic search and able to get df.index.get_level_values(0), but it returns me all the values and that causes loop to run multiple times for a day. I want to create a dataframe per day and send it for processing.
然后送去处理。我在 for 循环中执行此操作,因为我不确定是否有任何方法可以在不提及级别 0 列的确切值的情况下执行此操作。我做了一些基本的搜索并能够获得 df.index.get_level_values(0),但它返回了我所有的值,这导致循环在一天内运行多次。我想每天创建一个数据帧并将其发送以进行处理。
采纳答案by chrisb
One easy way would be to groupby the first level of the index - iterating over the groupby object will return the group keys and a subframe containing each group.
一种简单的方法是对索引的第一级进行分组 - 迭代 groupby 对象将返回组键和包含每个组的子帧。
In [136]: for date, new_df in df.groupby(level=0):
...: print(new_df)
...:
observation1 observation2
date Time
2012-11-02 9:15:00 79.373668 224
9:16:00 130.841316 477
observation1 observation2
date Time
2012-11-03 9:15:00 45.312814 835
9:16:00 123.776946 623
9:17:00 153.766460 624
9:18:00 463.276946 626
9:19:00 663.176934 622
9:20:00 763.773330 621
observation1 observation2
date Time
2012-11-04 9:15:00 115.449437 122
9:16:00 123.776946 555
9:17:00 153.766460 344
9:18:00 463.276946 212
回答by psorenson
What about this?
那这个呢?
for idate in df.index.get_level_values('date'):
complex_process(df.ix[idate], idate)
回答by melbay
Tagging off of @psorenson answer, we can get unique level indices and its related data frame slices without numpy as follows:
标记@psorenson 答案,我们可以获得唯一级别索引及其相关数据帧切片,无需 numpy,如下所示:
for date in df.index.get_level_values('date').unique():
print(df.loc[date])

