如何按天拆分 Pandas 数据帧或系列（可能使用迭代器）

Question

提问by Mannaggia

I have a long time series, eg.

我有一个很长的时间序列，例如。

import pandas as pd
index=pd.date_range(start='2012-11-05', end='2012-11-10', freq='1S').tz_localize('Europe/Berlin')
df=pd.DataFrame(range(len(index)), index=index, columns=['Number'])

Now I want to extract all sub-DataFrames for each day, to get the following output:

现在我想每天提取所有子数据帧，以获得以下输出：

df_2012-11-05: data frame with all data referring to day 2012-11-05
df_2012-11-06: etc.
df_2012-11-07
df_2012-11-08
df_2012-11-09
df_2012-11-10

What is the most effective way to do this avoiding to check if the index.date==give_date which is very slow. Also, the user does not know a priory the range of days in the frame.

避免检查是否非常慢的 index.date==give_date 的最有效方法是什么。此外，用户事先不知道框架中的天数范围。

Any hint do do this with an iterator?

有什么提示可以用迭代器做到这一点吗？

My current solution is this, but it is not so elegant and has two issues defined below:

我目前的解决方案是这样的，但它不是那么优雅，并且有两个问题定义如下：

time_zone='Europe/Berlin'
# find all days
a=np.unique(df.index.date) # this can take a lot of time
a.sort()
results=[]
for i in range(len(a)-1):
    day_now=pd.Timestamp(a[i]).tz_localize(time_zone)
    day_next=pd.Timestamp(a[i+1]).tz_localize(time_zone)
    results.append(df[day_now:day_next]) # how to select if I do not want day_next included?

# last day
results.append(df[day_next:])

This approach has the following problems:

这种方法存在以下问题：

a=np.unique(df.index.date) can take a lot of time
df[day_now:day_next] includes the day_next, but I need to exclude it in the range

a=np.unique(df.index.date) 可能需要很多时间
df[day_now:day_next] 包括 day_next，但我需要将其排除在范围内

Answer 1

回答by Woody Pride

Perhaps groupby?

也许是groupby？

DFList = []
for group in df.groupby(df.index.day):
    DFList.append(group[1])

Should give you a list of data frames where each data frame is one day of data.

应该给你一个数据框列表，其中每个数据框都是一天的数据。

Or in one line:

或者在一行中：

DFList = [group[1] for group in df.groupby(df.index.day)]

Gotta love python!

一定要爱蟒蛇！

Answer 2

回答by Peque

If you want to group by date (AKA: year+month+day), then use df.index.date:

如果要按日期分组（又名：年+月+日），请使用df.index.date：

result = [group[1] for group in df.groupby(df.index.date)]

As df.index.daywill use the day of the month (i.e.: from 1 to 31) for grouping, which could result in undesirable behavior if the input dataframe dates extend to multiple months.

由于df.index.day将使用月份中的日期（即：从 1 到 31）进行分组，如果输入数据框日期延长到多个月，这可能会导致不良行为。

如何按天拆分 Pandas 数据帧或系列（可能使用迭代器）

提问by Mannaggia

回答by Woody Pride

回答by Peque

相关推荐

最近更新

标签

如何按天拆分 Pandas 数据帧或系列（可能使用迭代器）

提问by Mannaggia

回答by Woody Pride

回答by Peque

相关推荐

Pandas Statsmodels ols 使用 DF 预测器进行回归预测？

pandas 从 csv 文件中读取列上的多索引

如何将 Pandas 系列写入 CSV 作为行而不是列？

Pandas 堆积条形图为大图例复制颜色

相关推荐

最近更新

标签