Python Pandas DataFrame 按周一至周日的每周定义将每日数据重新采样到每周?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/30989224/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 23:30:47  来源:igfitidea点击:

Python Pandas DataFrame resample daily data to week by Mon-Sun weekly definition?

pythonpandasdataframeresampling

提问by Jarad

import pandas as pd
import numpy as np

dates = pd.date_range('20141229',periods=14, name='Day')
df = pd.DataFrame({'Sum1': [1667, 1229, 1360, 9232, 8866, 4083, 3671, 10085, 10005, 8730, 10056, 10176, 3792, 3518],
                   'Sum2': [91, 75, 75, 254, 239, 108, 99, 259, 395, 355, 332, 386, 96, 111],
                   'Sum3': [365.95, 398.97, 285.12, 992.17, 1116.57, 512.11, 504.47, 1190.96, 1753.6, 1646.25, 1344.05, 1582.67, 560.95, 736.44],
                   'Sum4': [5, 5, 1, 5, 8, 8, 2, 10, 12, 16, 16, 6, 6, 3]},index=dates); print(df)

The dfproduced looks like this:

df制作这个样子的:

             Sum1  Sum2     Sum3  Sum4
Day                                   
2014-12-29   1667    91   365.95     5
2014-12-30   1229    75   398.97     5
2014-12-31   1360    75   285.12     1
2015-01-01   9232   254   992.17     5
2015-01-02   8866   239  1116.57     8
2015-01-03   4083   108   512.11     8
2015-01-04   3671    99   504.47     2
2015-01-05  10085   259  1190.96    10
2015-01-06  10005   395  1753.60    12
2015-01-07   8730   355  1646.25    16
2015-01-08  10056   332  1344.05    16
2015-01-09  10176   386  1582.67     6
2015-01-10   3792    96   560.95     6
2015-01-11   3518   111   736.44     3

Let's say I resample the Dataframeto try and sum the daily data into weekly rows:

假设我重新采样Dataframe以尝试将每日数据汇总为每周行:

df_resampled = df.resample('W', how='sum', label='left'); print(df_resampled)

This produces the following:

这会产生以下结果:

             Sum1  Sum2     Sum3  Sum4
Day                                   
2014-12-28  30108   941  4175.36    34
2015-01-04  56362  1934  8814.92    69

Question 1: my definition of a week is Mon - Sun. Since my data starts on 2014-12-29(a Monday), I want my Daylabelto also start on that day. How would I make the Dayindexlabelbe the date of every Monday instead of every Sunday?

问题 1:我对一周的定义是周一至周日。由于我的数据从2014-12-29(星期一)开始,我希望我的数据也从那Daylabel一天开始。我将如何使Dayindexlabel每个星期一而不是每个星期日成为日期?

Desired Output:

期望输出:

             Sum1  Sum2     Sum3  Sum4
Day                                   
2014-12-29  30108   941  4175.36    34
2015-01-05  56362  1934  8814.92    69

What have I tried regarding Question 1?

我对问题 1 做了什么尝试?

I changed 'W'to 'W-MON'but it produced 3 rows by counting 2014-12-29in 2014-12-22row which is not what I want:

我改'W''W-MON',但它通过计算产生3行2014-12-292014-12-22排这不是我想要的东西:

             Sum1  Sum2     Sum3  Sum4
Day                                   
2014-12-22   1667    91   365.95     5
2014-12-29  38526  1109  5000.37    39
2015-01-05  46277  1675  7623.96    59

Question 2: how would I format the Dayindexlabel to look like a range? Ex:

问题 2:如何将Dayindex标签格式化为一个范围?前任:

                         Sum1  Sum2     Sum3  Sum4
Day                                   
2014-12-29 - 2015-01-04  30108   941  4175.36    34
2015-01-05 - 2015-01-11  56362  1934  8814.92    69

采纳答案by Jianxun Li

This might help.

这可能会有所帮助。

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randint(1, 1000, (100, 4)), columns='Sum1 Sum2 Sum3 Sum4'.split(), index=pd.date_range('2014-12-29', periods=100, freq='D'))

def func(group):
    return pd.Series({'Sum1': group.Sum1.sum(), 'Sum2': group.Sum2.sum(),
        'Sum3': group.Sum3.sum(), 'Sum4': group.Sum4.sum(), 'Day': group.index[1], 'Period': '{0} - {1}'.format(group.index[0].date(), group.index[-1].date())})

df.groupby(lambda idx: idx.week).apply(func)

Out[386]: 
          Day                   Period  Sum1  Sum2  Sum3  Sum4
1  2014-12-30  2014-12-29 - 2015-01-04  3559  3692  3648  4086
2  2015-01-06  2015-01-05 - 2015-01-11  2990  3658  3348  3304
3  2015-01-13  2015-01-12 - 2015-01-18  3168  3720  3518  3273
4  2015-01-20  2015-01-19 - 2015-01-25  2275  4968  4095  2366
5  2015-01-27  2015-01-26 - 2015-02-01  4146  2167  3888  4576
..        ...                      ...   ...   ...   ...   ...
11 2015-03-10  2015-03-09 - 2015-03-15  4035  3518  2588  2714
12 2015-03-17  2015-03-16 - 2015-03-22  3399  3901  3430  2143
13 2015-03-24  2015-03-23 - 2015-03-29  3227  3308  3185  3814
14 2015-03-31  2015-03-30 - 2015-04-05  4278  3369  3623  4167
15 2015-04-07  2015-04-06 - 2015-04-07  1466   632  1136  1392

[15 rows x 6 columns]

回答by John Cummings

In case anyone else was not aware, it turns out that the weekly Anchored Offsetsare based on the end date. So, just resampling 'W' (which is the same as 'W-SUN') is by default a Monday to Sunday sample. The date listed is the end date. See this old bug reportwherein neither the documentation nor the API got updated.

万一其他人不知道,事实证明每周锚定偏移量基于结束日期。因此,仅重新采样 'W'(与 'W-SUN' 相同)默认为周一至周日的样本。列出的日期是结束日期。请参阅此旧错误报告,其中文档和 API 均未更新。

Given that you specified label='left'in the resample parameters, you must have realized that fact. It's also why using 'W-MON' does not have the desired effect. What is confusing is that the left bound is not actually in the interval.

鉴于您label='left'resample parameters 中指定,您一定已经意识到这一事实。这也是使用“W-MON”没有达到预期效果的原因。令人困惑的是,左边界实际上不在区间内。

So, to display the start date for the period instead of the end date, you may add a day to the index. That would mean you would do:

因此,要显示期间的开始日期而不是结束日期,您可以在索引中添加一天。那意味着你会这样做:

df_resampled.index = df_resampled.index + pd.DateOffset(days=1)

For completeness, here is your original data with another day (a Sunday) added on the beginning to show the grouping really is Monday to Sunday:

为了完整起见,这里是您的原始数据,在开头添加了另一天(星期日)以显示分组确实是星期一到星期日:

import pandas as pd
import numpy as np

dates = pd.date_range('20141228',periods=15, name='Day')
df = pd.DataFrame({'Sum1': [10000, 1667, 1229, 1360, 9232, 8866, 4083, 3671, 10085, 10005, 8730, 10056, 10176, 3792, 3518],
               'Sum2': [10000, 91, 75, 75, 254, 239, 108, 99, 259, 395, 355, 332, 386, 96, 111],
               'Sum3': [10000, 365.95, 398.97, 285.12, 992.17, 1116.57, 512.11, 504.47, 1190.96, 1753.6, 1646.25, 1344.05, 1582.67, 560.95, 736.44],
               'Sum4': [10000, 5, 5, 1, 5, 8, 8, 2, 10, 12, 16, 16, 6, 6, 3]},index=dates);
print(df)
df_resampled = df.resample('W', how='sum', label='left')
df_resampled.index = df_resampled.index - pd.DateOffset(days=1)
print(df_resampled)

This outputs:

这输出:

             Sum1   Sum2      Sum3   Sum4
Day
2014-12-28  10000  10000  10000.00  10000
2014-12-29   1667     91    365.95      5
2014-12-30   1229     75    398.97      5
2014-12-31   1360     75    285.12      1
2015-01-01   9232    254    992.17      5
2015-01-02   8866    239   1116.57      8
2015-01-03   4083    108    512.11      8
2015-01-04   3671     99    504.47      2
2015-01-05  10085    259   1190.96     10
2015-01-06  10005    395   1753.60     12
2015-01-07   8730    355   1646.25     16
2015-01-08  10056    332   1344.05     16
2015-01-09  10176    386   1582.67      6
2015-01-10   3792     96    560.95      6
2015-01-11   3518    111    736.44      3

             Sum1   Sum2      Sum3   Sum4
Day                                      
2014-12-22  10000  10000  10000.00  10000
2014-12-29  30108    941   4175.36     34
2015-01-05  56362   1934   8814.92     69

I believe that is what you wanted for Question 1.

我相信这就是你想要的问题 1。

Update

更新

There is now a loffsetargument to resample()that allows you to shift the label offset. So, instead of modifying the index, you simple add the loffsetargument like so:

现在有一个loffset参数resample()允许您移动标签偏移量。因此,不是修改索引,而是简单地添加loffset参数,如下所示:

df.resample('W', how='sum', label='left', loffset=pd.DateOffset(days=1))

Also of note how=sumis now deprecated in favor of using .sum()on the Resampler object that .resample()returns. So, the fully updated call would be:

同样值得注意的how=sum是,现在已弃用,而是支持.sum().resample()返回的 Resampler 对象上使用。因此,完全更新的调用将是:

df_resampled = df.resample('W', label='left', loffset=pd.DateOffset(days=1)).sum()