pandas 使用特定的开始时间重新采样每小时的 TimeSeries

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/12579150/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 20:26:24  来源:igfitidea点击:

Resample hourly TimeSeries with certain starting hour

pythonpandas

提问by MaM

I want to resample a TimeSeries in daily (exactly 24 hours) frequence starting at a certain hour.

我想从某个小时开始以每天(恰好 24 小时)的频率重新采样 TimeSeries。

Like:

喜欢:

index = date_range(datetime(2012,1,1,17), freq='H', periods=60)

ts = Series(data=[1]*60, index=index)

ts.resample(rule='D', how='sum', closed='left', label='left')

Result i get:

我得到的结果:

2012-01-01  7
2012-01-02 24
2012-01-03 24
2012-01-04  5
Freq: D

Result i wish:

我希望的结果:

2012-01-01 17:00:00 24
2012-01-02 17:00:00 24
2012-01-03 17:00:00 12
Freq: D

Some weeks ago you could pass '24H'to the freqargument and it worked totally fine. But now it combines '24H'to '1D'.

几个星期前,你可以传递'24H'给这个freq论点,它工作得很好。但现在它结合'24H''1D'.

Was I using a bug with '24H'which is fixed now? And how can i get the wished result in a efficient and pythonic (or pandas) way back?

我是否使用了'24H'现在已修复的错误?我怎样才能以高效且 Pythonic(或 Pandas)的方式获得预期的结果?

versions:

版本:

  • python 2.7.3
  • pandas 0.9.0rc1 (but doesn't work in 0.8.1, too)
  • numpy 1.6.1
  • 蟒蛇 2.7.3
  • pandas 0.9.0rc1(但在 0.8.1 中也不起作用)
  • 麻木 1.6.1

回答by Andy Hayden

Resamplehas an baseargument which covers this case:

Resample有一个base论据涵盖了这种情况:

ts.resample(rule='24H', closed='left', label='left', base=17).sum()

Output:

输出:

2012-01-01 17:00:00    24
2012-01-02 17:00:00    24
2012-01-03 17:00:00    12
Freq: 24H

回答by Thomas G.

2020 Update: for dataframes

2020 年更新:用于数据帧

Use the basekeyword as referred in the doc:

使用文档中base提到的关键字:

base description of documentation

文档的基本描述

Code example:

代码示例:

df.resample(pd.Timedelta('24 hours'), # or '24H'
 base=17 # <--  ADD THIS
).sum()