如何按天拆分 Pandas 数据帧或系列(可能使用迭代器)

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/21605491/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 21:40:28  来源:igfitidea点击:

How to split a pandas dataframe or series by day (possibly using an iterator)

pythonindexingpandastime-series

提问by Mannaggia

I have a long time series, eg.

我有一个很长的时间序列,例如。

import pandas as pd
index=pd.date_range(start='2012-11-05', end='2012-11-10', freq='1S').tz_localize('Europe/Berlin')
df=pd.DataFrame(range(len(index)), index=index, columns=['Number'])

Now I want to extract all sub-DataFrames for each day, to get the following output:

现在我想每天提取所有子数据帧,以获得以下输出:

df_2012-11-05: data frame with all data referring to day 2012-11-05
df_2012-11-06: etc.
df_2012-11-07
df_2012-11-08
df_2012-11-09
df_2012-11-10

What is the most effective way to do this avoiding to check if the index.date==give_date which is very slow. Also, the user does not know a priory the range of days in the frame.

避免检查是否非常慢的 index.date==give_date 的最有效方法是什么。此外,用户事先不知道框架中的天数范围。

Any hint do do this with an iterator?

有什么提示可以用迭代器做到这一点吗?

My current solution is this, but it is not so elegant and has two issues defined below:

我目前的解决方案是这样的,但它不是那么优雅,并且有两个问题定义如下:

time_zone='Europe/Berlin'
# find all days
a=np.unique(df.index.date) # this can take a lot of time
a.sort()
results=[]
for i in range(len(a)-1):
    day_now=pd.Timestamp(a[i]).tz_localize(time_zone)
    day_next=pd.Timestamp(a[i+1]).tz_localize(time_zone)
    results.append(df[day_now:day_next]) # how to select if I do not want day_next included?

# last day
results.append(df[day_next:])

This approach has the following problems:

这种方法存在以下问题:

  • a=np.unique(df.index.date) can take a lot of time
  • df[day_now:day_next] includes the day_next, but I need to exclude it in the range
  • a=np.unique(df.index.date) 可能需要很多时间
  • df[day_now:day_next] 包括 day_next,但我需要将其排除在范围内

回答by Woody Pride

Perhaps groupby?

也许是groupby?

DFList = []
for group in df.groupby(df.index.day):
    DFList.append(group[1])

Should give you a list of data frames where each data frame is one day of data.

应该给你一个数据框列表,其中每个数据框都是一天的数据。

Or in one line:

或者在一行中:

DFList = [group[1] for group in df.groupby(df.index.day)]

Gotta love python!

一定要爱蟒蛇!

回答by Peque

If you want to group by date (AKA: year+month+day), then use df.index.date:

如果要按日期分组(又名:年+月+日),请使用df.index.date

result = [group[1] for group in df.groupby(df.index.date)]

As df.index.daywill use the day of the month (i.e.: from 1 to 31) for grouping, which could result in undesirable behavior if the input dataframe dates extend to multiple months.

由于df.index.day将使用月份中的日期(即:从 1 到 31)进行分组,如果输入数据框日期延长到多个月,这可能会导致不良行为。