pandas 使用每月的第一个交易日将每日熊猫股票数据转换为每月数据

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/28126286/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 22:52:41  来源:igfitidea点击:

Convert daily pandas stock data to monthly data using first trade day of the month

pythonrpandas

提问by user2766344

I have a set of calculated OHLCVA daily securities data in a pandas dataframe like this:

我在 Pandas 数据框中有一组计算的 OHLCVA 每日证券数据,如下所示:

>>> type(data_dy)
<class 'pandas.core.frame.DataFrame'>
>>> data_dy
              Open    High     Low   Close     Volume  Adj Close
Date                                                            
2012-12-28  140.64  141.42  139.87  140.03  148806700     134.63
2012-12-31  139.66  142.56  139.54  142.41  243935200     136.92
2013-01-02  145.11  146.15  144.73  146.06  192059000     140.43
2013-01-03  145.99  146.37  145.34  145.73  144761800     140.11
2013-01-04  145.97  146.61  145.67  146.37  116817700     140.72

[5 rows x 6 columns]

I'm using the following dictionary and the pandas resample function to convert the dataframe to monthly data:

我正在使用以下字典和 pandas resample 函数将数据框转换为每月数据:

>>> ohlc_dict = {'Open':'first','High':'max','Low':'min','Close': 'last','Volume': 'sum','Adj Close': 'last'}

>>> data_dy.resample('M', how=ohlc_dict, closed='right', label='right')
               Volume  Adj Close    High     Low   Close    Open
Date                                                            
2012-12-31  392741900     136.92  142.56  139.54  142.41  140.64
2013-01-31  453638500     140.72  146.61  144.73  146.37  145.11

[2 rows x 6 columns]

This does the calculations correctly, but I'd like to use the Yahoo! date convention for monthly data of using the first trading day of the period rather than the last calendar day of the period that pandas uses.

这可以正确进行计算,但我想使用 Yahoo! 每月数据的日期约定使用期间的第一个交易日而不是 pandas 使用的期间的最后一个日历日。

So I'd like the answer set to be:

所以我希望答案是:

               Volume  Adj Close    High     Low   Close    Open
Date                                                            
2012-12-28  392741900     136.92  142.56  139.54  142.41  140.64
2013-01-02  453638500     140.72  146.61  144.73  146.37  145.11

I could do this by converting the daily data to a python list, process the data and return the data to a dataframe, but how do can this be done with pandas?

我可以通过将每日数据转换为 python 列表、处理数据并将数据返回到数据框来做到这一点,但是如何用 Pandas 做到这一点?

回答by JAB

Instead of Myou can pass MSas the resample rule:

而不是M您可以MS作为重新采样规则传递:

df =pd.DataFrame( range(72), index = pd.date_range('1/1/2011', periods=72, freq='D'))

#df.resample('MS', how = 'mean')    # pandas <0.18
df.resample('MS').mean()  # pandas >= 0.18

Updated to use the first business day of the month respecting US Federal Holidays:

更新为使用与美国联邦假期相关的月份的第一个工作日:

df =pd.DataFrame( range(200), index = pd.date_range('12/1/2012', periods=200, freq='D'))

from pandas.tseries.offsets import CustomBusinessMonthBegin
from pandas.tseries.holiday import USFederalHolidayCalendar
bmth_us = CustomBusinessMonthBegin(calendar=USFederalHolidayCalendar())

df.resample(bmth_us).mean()

if you want custom starts of the month using the min month found in the data try this. (It isn't pretty, but it should work).

如果您想使用数据中找到的最小月份自定义月份开始,请尝试此操作。(它不漂亮,但它应该可以工作)。

month_index =df.index.to_period('M')

min_day_in_month_index = pd.to_datetime(df.set_index(new_index, append=True).reset_index(level=0).groupby(level=0)['level_0'].min())

custom_month_starts =CustomBusinessMonthBegin(calendar = min_day_in_month_index)

Pass custom_start_monthsto the fist parameter of resample

传递custom_start_months给的第一个参数resample

回答by user2766344

Thank you J Bradley, your solution worked perfectly. I did have to upgrade my version of pandas from their official website though as the version installed via pip did not have CustomBusinessMonthBegin in pandas.tseries.offsets. My final code was:

谢谢 J Bradley,您的解决方案非常有效。我确实必须从他们的官方网站升级我的 pandas 版本,因为通过 pip 安装的版本在 pandas.tseries.offsets 中没有 CustomBusinessMonthBegin。我的最终代码是:

#----- imports -----
import pandas as pd
from pandas.tseries.offsets import CustomBusinessMonthBegin
import pandas.io.data as web
#----- get sample data -----
df = web.get_data_yahoo('SPY', '2012-12-01', '2013-12-31')
#----- build custom calendar -----
month_index =df.index.to_period('M')
min_day_in_month_index = pd.to_datetime(df.set_index(month_index, append=True).reset_index(level=0).groupby(level=0)['Open'].min())
custom_month_starts = CustomBusinessMonthBegin(calendar = min_day_in_month_index)
#----- convert daily data to monthly data -----
ohlc_dict = {'Open':'first','High':'max','Low':'min','Close': 'last','Volume': 'sum','Adj Close': 'last'}
mthly_ohlcva = df.resample(custom_month_starts, how=ohlc_dict)

This yielded the following:

这产生了以下结果:

>>> mthly_ohlcva
                Volume  Adj Close    High     Low   Close    Open
Date                                                             
2012-12-03  2889875900     136.92  145.58  139.54  142.41  142.80
2013-01-01  2587140200     143.92  150.94  144.73  149.70  145.11
2013-02-01  2581459300     145.76  153.28  148.73  151.61  150.65
2013-03-01  2330972300     151.30  156.85  150.41  156.67  151.09
2013-04-01  2907035000     154.20  159.72  153.55  159.68  156.59
2013-05-01  2781596000     157.84  169.07  158.10  163.45  159.33
2013-06-03  3533321800     155.74  165.99  155.73  160.42  163.83
2013-07-01  2330904500     163.78  169.86  160.22  168.71  161.26
2013-08-01  2283131700     158.87  170.97  163.05  163.65  169.99
2013-09-02  2226749600     163.90  173.60  163.70  168.01  165.23
2013-10-01  2901739000     171.49  177.51  164.53  175.79  168.14
2013-11-01  1930952900     176.57  181.75  174.76  181.00  176.02
2013-12-02  2232775900     181.15  184.69  177.32  184.69  181.09

回答by Xoel López Barata

I've seen in the last version of pandas you can use time offset alias 'BMS', which stands for "business month start frequency" or 'BM', which stands for "business month end frequency".

我在最新版本的Pandas中看到您可以使用时间偏移别名“BMS”,它代表“营业月开始频率”或“BM”,它代表“营业月结束频率”。

The code in the first case would look like

第一种情况下的代码看起来像

data_dy.resample('BMS', closed='right', label='right').apply(ohlc_dict)

or, in the second case,

或者,在第二种情况下,

data_dy.resample('BM', closed='right', label='right').apply(ohlc_dict)