Python Pandas - 按日期对日内时间序列进行分组

Question

提问by signalseeker

I have an intra day series of log returns over multiple days that I would like to downsample to daily ohlc. I can do something like

我有一个多天的日内系列日志回报，我想将其下采样到每日 ohlc。我可以做类似的事情

hi = series.resample('B', how=lambda x: np.max(np.cumsum()))
low = series.resample('B', how=lambda x: np.min(np.cumsum()))

But it seems inefficient to compute cumsum on each call. Is there a way to first compute the cumsums and then apply 'ohcl' to the data?

但是在每次调用时计算 cumsum 似乎效率低下。有没有办法先计算 cumsums，然后将 'ohcl' 应用于数据？

1999-08-09 12:30:00-04:00   -0.000486
1999-08-09 12:31:00-04:00   -0.000606
1999-08-09 12:32:00-04:00   -0.000120
1999-08-09 12:33:00-04:00   -0.000037
1999-08-09 12:34:00-04:00   -0.000337
1999-08-09 12:35:00-04:00    0.000100
1999-08-09 12:36:00-04:00    0.000219
1999-08-09 12:37:00-04:00    0.000285
1999-08-09 12:38:00-04:00   -0.000981
1999-08-09 12:39:00-04:00   -0.000487
1999-08-09 12:40:00-04:00    0.000476
1999-08-09 12:41:00-04:00    0.000362
1999-08-09 12:42:00-04:00   -0.000038
1999-08-09 12:43:00-04:00   -0.000310
1999-08-09 12:44:00-04:00   -0.000337
...
1999-09-28 06:45:00-04:00    0.000000
1999-09-28 06:46:00-04:00    0.000000
1999-09-28 06:47:00-04:00    0.000000
1999-09-28 06:48:00-04:00    0.000102
1999-09-28 06:49:00-04:00   -0.000068
1999-09-28 06:50:00-04:00    0.000136
1999-09-28 06:51:00-04:00    0.000566
1999-09-28 06:52:00-04:00    0.000469
1999-09-28 06:53:00-04:00    0.000000
1999-09-28 06:54:00-04:00    0.000000
1999-09-28 06:55:00-04:00    0.000000
1999-09-28 06:56:00-04:00    0.000000
1999-09-28 06:57:00-04:00    0.000000
1999-09-28 06:58:00-04:00    0.000000
1999-09-28 06:59:00-04:00    0.000000

Answer 1

采纳答案by signalseeker

df.groupby([df.index.year, df.index.month, df.index.day]).transform(np.cumsum).resample('B', how='ohlc')

I think this might be what I want but I have to test.

我想这可能是我想要的，但我必须测试。

EDIT: After zelazny7's repsonse:

编辑：在 zelazny7 的回复之后：

df.groupby(pd.TimeGrouper('D')).transform(np.cumsum).resample('D', how='ohlc')

works and is also more efficient than my previous solution.

有效并且比我以前的解决方案更有效。

UPDATE:

更新：

pd.TimeGrouper('D') is deprecated since pandas v0.21.0.

pd.TimeGrouper('D') 自pandas v0.21.0起已弃用。

Use pd.Grouper()instead:

使用pd.Grouper()来代替：

df.groupby(pd.Grouper(freq='D')).transform(np.cumsum).resample('D', how='ohlc')

Answer 2

回答by Zelazny7

I wasn't able to get your resample suggestion to work. Did you have any luck? Here's a way to aggregate the data at the business day level and compute the OHLC stats in one pass:

我无法让您的重新采样建议起作用。你有什么运气吗？这是一种在工作日级别汇总数据并一次性计算 OHLC 统计数据的方法：

from io import BytesIO
from pandas import *

text = """1999-08-09 12:30:00-04:00   -0.000486
1999-08-09 12:31:00-04:00   -0.000606
1999-08-09 12:32:00-04:00   -0.000120
1999-08-09 12:33:00-04:00   -0.000037
1999-08-09 12:34:00-04:00   -0.000337
1999-08-09 12:35:00-04:00    0.000100
1999-08-09 12:36:00-04:00    0.000219
1999-08-09 12:37:00-04:00    0.000285
1999-08-09 12:38:00-04:00   -0.000981
1999-08-09 12:39:00-04:00   -0.000487
1999-08-09 12:40:00-04:00    0.000476
1999-08-09 12:41:00-04:00    0.000362
1999-08-09 12:42:00-04:00   -0.000038
1999-08-09 12:43:00-04:00   -0.000310
1999-08-09 12:44:00-04:00   -0.000337"""

df = read_csv(BytesIO(text), sep='\s+', parse_dates=[[0,1]], index_col=[0], header=None)

Here I create a dictionary of dictionaries. The outer key references the columns you want to apply the functions to. The inner key contains the names of your aggregation functions and the inner values are the functions you want to apply:

在这里，我创建了一个字典字典。外部键引用您要应用函数的列。内部键包含聚合函数的名称，内部值是要应用的函数：

f = {2: {'O':'first',
         'H':'max',
         'L':'min',
         'C':'last'}}

df.groupby(TimeGrouper(freq='B')).agg(f)

Out:
                   2
                   H         C         L         O
1999-08-09  0.000476 -0.000337 -0.000981 -0.000486

Python Pandas - 按日期对日内时间序列进行分组

提问by signalseeker

采纳答案by signalseeker

回答by Zelazny7

相关推荐

最近更新

标签

Python Pandas - 按日期对日内时间序列进行分组

提问by signalseeker

采纳答案by signalseeker

回答by Zelazny7

相关推荐

Python 不区分大小写的“in”

Python 使用牛顿法求平方根（错误！）

Python 导入错误：DLL 加载失败：%1 不是有效的 Win32 应用程序

Python“如果 X == Y 和 Z”语法

相关推荐

最近更新

标签