Pandas - 根据日期将数据帧拆分为多个数据帧？

Question

提问by Alex F

I have a dataframe with multiple columns along with a date column. The date format is 12/31/15 and I have set it as a datetime object.

我有一个包含多个列和日期列的数据框。日期格式为 12/31/15，我已将其设置为日期时间对象。

I set the datetime column as the index and want to perform a regression calculation for each month of the dataframe.

我将日期时间列设置为索引，并希望对数据框的每个月执行回归计算。

I believe the methodology to do this would be to split the dataframe into multiple dataframes based on month, store into a list of dataframes, then perform regression on each dataframe in the list.

我相信这样做的方法是根据月份将数据帧拆分为多个数据帧，存储到数据帧列表中，然后对列表中的每个数据帧执行回归。

I have used groupby which successfully split the dataframe by month, but am unsure how to correctly convert each group in the groupby object into a dataframe to be able to run my regression function on it.

我已经使用 groupby 成功地按月拆分数据帧，但我不确定如何正确地将 groupby 对象中的每个组转换为数据帧，以便能够在其上运行我的回归函数。

Does anyone know how to split a dataframe into multiple dataframes based on date, or a better approach to my problem?

有谁知道如何根据日期将数据帧拆分为多个数据帧，或者是解决我的问题的更好方法？

Here is my code I've written so far

这是我到目前为止编写的代码

import pandas as pd
import numpy as np
import statsmodels.api as sm
from patsy import dmatrices

df = pd.read_csv('data.csv')
df['date'] = pd.to_datetime(df['date'], format='%Y%m%d')
df = df.set_index('date')

# Group dataframe on index by month and year 
# Groupby works, but dmatrices does not 
for df_group in df.groupby(pd.TimeGrouper("M")):
    y,X = dmatrices('value1 ~ value2 + value3', data=df_group,      
    return_type='dataframe')

Answer 1

回答by daedalus

If you must loop, you need to unpack the key and the dataframe when you iterate over a groupbyobject:

如果必须循环，则需要在迭代groupby对象时解压键和数据帧：

import pandas as pd
import numpy as np
import statsmodels.api as sm
from patsy import dmatrices

df = pd.read_csv('data.csv')
df['date'] = pd.to_datetime(df['date'], format='%Y%m%d')
df = df.set_index('date')

Note the use of group_namehere:

注意group_name这里的使用：

for group_name, df_group in df.groupby(pd.Grouper(freq='M')):
    y,X = dmatrices('value1 ~ value2 + value3', data=df_group,      
    return_type='dataframe')

If you want to avoid iteration, do have a look at the notebook in Paul H's gist(see his comment), but a simple example of using applywould be:

如果您想避免迭代，请查看Paul H 的要点中的笔记本（请参阅他的评论），但一个简单的使用示例apply是：

def do_regression(df_group, ret='outcome'):
    """Apply the function to each group in the data and return one result."""
    y,X = dmatrices('value1 ~ value2 + value3',
                    data=df_group,      
                    return_type='dataframe')
    if ret == 'outcome':
        return y
    else:
        return X

outcome = df.groupby(pd.Grouper(freq='M')).apply(do_regression, ret='outcome')

Answer 2

回答by Pjl

This is a split per year.

这是每年的拆分。

import pandas as pd
import dateutil.parser
dfile = 'rg_unificado.csv'
df = pd.read_csv(dfile, sep='|', quotechar='"', encoding='latin-1')
df['FECHA'] = df['FECHA'].apply(lambda x: dateutil.parser.parse(x)) 
#http://pandas.pydata.org/pandas-docs/stable/timeseries.html#offset-aliases
#use to_period
per = df['FECHA'].dt.to_period("Y")
#group by that period
agg = df.groupby([per])
for year, group in agg:
    #this simple save the data
    datep =  str(year).replace('-', '')
    filename = '%s_%s.csv' % (dfile.replace('.csv', ''), datep)
    group.to_csv(filename, sep='|', quotechar='"', encoding='latin-1', index=False, header=True)

Pandas - 根据日期将数据帧拆分为多个数据帧？

提问by Alex F

回答by daedalus

回答by Pjl

相关推荐

最近更新

标签

Pandas - 根据日期将数据帧拆分为多个数据帧？

提问by Alex F

回答by daedalus

回答by Pjl

相关推荐

pandas 在熊猫数据框中的任何列中删除带有“问号”值的行

pandas 为什么将熊猫导入为 pd 是惯例？

pandas 创建数据框字典

pandas 包含数组的熊猫系列

相关推荐

最近更新

标签