Python Pandas DataFrame 按周一至周日的每周定义将每日数据重新采样到每周?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/30989224/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python Pandas DataFrame resample daily data to week by Mon-Sun weekly definition?
提问by Jarad
import pandas as pd
import numpy as np
dates = pd.date_range('20141229',periods=14, name='Day')
df = pd.DataFrame({'Sum1': [1667, 1229, 1360, 9232, 8866, 4083, 3671, 10085, 10005, 8730, 10056, 10176, 3792, 3518],
'Sum2': [91, 75, 75, 254, 239, 108, 99, 259, 395, 355, 332, 386, 96, 111],
'Sum3': [365.95, 398.97, 285.12, 992.17, 1116.57, 512.11, 504.47, 1190.96, 1753.6, 1646.25, 1344.05, 1582.67, 560.95, 736.44],
'Sum4': [5, 5, 1, 5, 8, 8, 2, 10, 12, 16, 16, 6, 6, 3]},index=dates); print(df)
The dfproduced looks like this:
在df制作这个样子的:
Sum1 Sum2 Sum3 Sum4
Day
2014-12-29 1667 91 365.95 5
2014-12-30 1229 75 398.97 5
2014-12-31 1360 75 285.12 1
2015-01-01 9232 254 992.17 5
2015-01-02 8866 239 1116.57 8
2015-01-03 4083 108 512.11 8
2015-01-04 3671 99 504.47 2
2015-01-05 10085 259 1190.96 10
2015-01-06 10005 395 1753.60 12
2015-01-07 8730 355 1646.25 16
2015-01-08 10056 332 1344.05 16
2015-01-09 10176 386 1582.67 6
2015-01-10 3792 96 560.95 6
2015-01-11 3518 111 736.44 3
Let's say I resample the Dataframeto try and sum the daily data into weekly rows:
假设我重新采样Dataframe以尝试将每日数据汇总为每周行:
df_resampled = df.resample('W', how='sum', label='left'); print(df_resampled)
This produces the following:
这会产生以下结果:
Sum1 Sum2 Sum3 Sum4
Day
2014-12-28 30108 941 4175.36 34
2015-01-04 56362 1934 8814.92 69
Question 1: my definition of a week is Mon - Sun. Since my data starts on 2014-12-29(a Monday), I want my Daylabelto also start on that day. How would I make the Dayindexlabelbe the date of every Monday instead of every Sunday?
问题 1:我对一周的定义是周一至周日。由于我的数据从2014-12-29(星期一)开始,我希望我的数据也从那Daylabel一天开始。我将如何使Dayindexlabel每个星期一而不是每个星期日成为日期?
Desired Output:
期望输出:
Sum1 Sum2 Sum3 Sum4
Day
2014-12-29 30108 941 4175.36 34
2015-01-05 56362 1934 8814.92 69
What have I tried regarding Question 1?
我对问题 1 做了什么尝试?
I changed 'W'to 'W-MON'but it produced 3 rows by counting 2014-12-29in 2014-12-22row which is not what I want:
我改'W'到'W-MON',但它通过计算产生3行2014-12-29中2014-12-22排这不是我想要的东西:
Sum1 Sum2 Sum3 Sum4
Day
2014-12-22 1667 91 365.95 5
2014-12-29 38526 1109 5000.37 39
2015-01-05 46277 1675 7623.96 59
Question 2: how would I format the Dayindexlabel to look like a range? Ex:
问题 2:如何将Dayindex标签格式化为一个范围?前任:
Sum1 Sum2 Sum3 Sum4
Day
2014-12-29 - 2015-01-04 30108 941 4175.36 34
2015-01-05 - 2015-01-11 56362 1934 8814.92 69
采纳答案by Jianxun Li
This might help.
这可能会有所帮助。
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(1, 1000, (100, 4)), columns='Sum1 Sum2 Sum3 Sum4'.split(), index=pd.date_range('2014-12-29', periods=100, freq='D'))
def func(group):
return pd.Series({'Sum1': group.Sum1.sum(), 'Sum2': group.Sum2.sum(),
'Sum3': group.Sum3.sum(), 'Sum4': group.Sum4.sum(), 'Day': group.index[1], 'Period': '{0} - {1}'.format(group.index[0].date(), group.index[-1].date())})
df.groupby(lambda idx: idx.week).apply(func)
Out[386]:
Day Period Sum1 Sum2 Sum3 Sum4
1 2014-12-30 2014-12-29 - 2015-01-04 3559 3692 3648 4086
2 2015-01-06 2015-01-05 - 2015-01-11 2990 3658 3348 3304
3 2015-01-13 2015-01-12 - 2015-01-18 3168 3720 3518 3273
4 2015-01-20 2015-01-19 - 2015-01-25 2275 4968 4095 2366
5 2015-01-27 2015-01-26 - 2015-02-01 4146 2167 3888 4576
.. ... ... ... ... ... ...
11 2015-03-10 2015-03-09 - 2015-03-15 4035 3518 2588 2714
12 2015-03-17 2015-03-16 - 2015-03-22 3399 3901 3430 2143
13 2015-03-24 2015-03-23 - 2015-03-29 3227 3308 3185 3814
14 2015-03-31 2015-03-30 - 2015-04-05 4278 3369 3623 4167
15 2015-04-07 2015-04-06 - 2015-04-07 1466 632 1136 1392
[15 rows x 6 columns]
回答by John Cummings
In case anyone else was not aware, it turns out that the weekly Anchored Offsetsare based on the end date. So, just resampling 'W' (which is the same as 'W-SUN') is by default a Monday to Sunday sample. The date listed is the end date. See this old bug reportwherein neither the documentation nor the API got updated.
万一其他人不知道,事实证明每周锚定偏移量基于结束日期。因此,仅重新采样 'W'(与 'W-SUN' 相同)默认为周一至周日的样本。列出的日期是结束日期。请参阅此旧错误报告,其中文档和 API 均未更新。
Given that you specified label='left'in the resample parameters, you must have realized that fact. It's also why using 'W-MON' does not have the desired effect. What is confusing is that the left bound is not actually in the interval.
鉴于您label='left'在resample parameters 中指定,您一定已经意识到这一事实。这也是使用“W-MON”没有达到预期效果的原因。令人困惑的是,左边界实际上不在区间内。
So, to display the start date for the period instead of the end date, you may add a day to the index. That would mean you would do:
因此,要显示期间的开始日期而不是结束日期,您可以在索引中添加一天。那意味着你会这样做:
df_resampled.index = df_resampled.index + pd.DateOffset(days=1)
For completeness, here is your original data with another day (a Sunday) added on the beginning to show the grouping really is Monday to Sunday:
为了完整起见,这里是您的原始数据,在开头添加了另一天(星期日)以显示分组确实是星期一到星期日:
import pandas as pd
import numpy as np
dates = pd.date_range('20141228',periods=15, name='Day')
df = pd.DataFrame({'Sum1': [10000, 1667, 1229, 1360, 9232, 8866, 4083, 3671, 10085, 10005, 8730, 10056, 10176, 3792, 3518],
'Sum2': [10000, 91, 75, 75, 254, 239, 108, 99, 259, 395, 355, 332, 386, 96, 111],
'Sum3': [10000, 365.95, 398.97, 285.12, 992.17, 1116.57, 512.11, 504.47, 1190.96, 1753.6, 1646.25, 1344.05, 1582.67, 560.95, 736.44],
'Sum4': [10000, 5, 5, 1, 5, 8, 8, 2, 10, 12, 16, 16, 6, 6, 3]},index=dates);
print(df)
df_resampled = df.resample('W', how='sum', label='left')
df_resampled.index = df_resampled.index - pd.DateOffset(days=1)
print(df_resampled)
This outputs:
这输出:
Sum1 Sum2 Sum3 Sum4
Day
2014-12-28 10000 10000 10000.00 10000
2014-12-29 1667 91 365.95 5
2014-12-30 1229 75 398.97 5
2014-12-31 1360 75 285.12 1
2015-01-01 9232 254 992.17 5
2015-01-02 8866 239 1116.57 8
2015-01-03 4083 108 512.11 8
2015-01-04 3671 99 504.47 2
2015-01-05 10085 259 1190.96 10
2015-01-06 10005 395 1753.60 12
2015-01-07 8730 355 1646.25 16
2015-01-08 10056 332 1344.05 16
2015-01-09 10176 386 1582.67 6
2015-01-10 3792 96 560.95 6
2015-01-11 3518 111 736.44 3
Sum1 Sum2 Sum3 Sum4
Day
2014-12-22 10000 10000 10000.00 10000
2014-12-29 30108 941 4175.36 34
2015-01-05 56362 1934 8814.92 69
I believe that is what you wanted for Question 1.
我相信这就是你想要的问题 1。
Update
更新
There is now a loffsetargument to resample()that allows you to shift the label offset. So, instead of modifying the index, you simple add the loffsetargument like so:
现在有一个loffset参数resample()允许您移动标签偏移量。因此,不是修改索引,而是简单地添加loffset参数,如下所示:
df.resample('W', how='sum', label='left', loffset=pd.DateOffset(days=1))
Also of note how=sumis now deprecated in favor of using .sum()on the Resampler object that .resample()returns. So, the fully updated call would be:
同样值得注意的how=sum是,现在已弃用,而是支持.sum()在.resample()返回的 Resampler 对象上使用。因此,完全更新的调用将是:
df_resampled = df.resample('W', label='left', loffset=pd.DateOffset(days=1)).sum()

