pandas 使用每月的第一个交易日将每日熊猫股票数据转换为每月数据
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/28126286/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Convert daily pandas stock data to monthly data using first trade day of the month
提问by user2766344
I have a set of calculated OHLCVA daily securities data in a pandas dataframe like this:
我在 Pandas 数据框中有一组计算的 OHLCVA 每日证券数据,如下所示:
>>> type(data_dy)
<class 'pandas.core.frame.DataFrame'>
>>> data_dy
Open High Low Close Volume Adj Close
Date
2012-12-28 140.64 141.42 139.87 140.03 148806700 134.63
2012-12-31 139.66 142.56 139.54 142.41 243935200 136.92
2013-01-02 145.11 146.15 144.73 146.06 192059000 140.43
2013-01-03 145.99 146.37 145.34 145.73 144761800 140.11
2013-01-04 145.97 146.61 145.67 146.37 116817700 140.72
[5 rows x 6 columns]
I'm using the following dictionary and the pandas resample function to convert the dataframe to monthly data:
我正在使用以下字典和 pandas resample 函数将数据框转换为每月数据:
>>> ohlc_dict = {'Open':'first','High':'max','Low':'min','Close': 'last','Volume': 'sum','Adj Close': 'last'}
>>> data_dy.resample('M', how=ohlc_dict, closed='right', label='right')
Volume Adj Close High Low Close Open
Date
2012-12-31 392741900 136.92 142.56 139.54 142.41 140.64
2013-01-31 453638500 140.72 146.61 144.73 146.37 145.11
[2 rows x 6 columns]
This does the calculations correctly, but I'd like to use the Yahoo! date convention for monthly data of using the first trading day of the period rather than the last calendar day of the period that pandas uses.
这可以正确进行计算,但我想使用 Yahoo! 每月数据的日期约定使用期间的第一个交易日而不是 pandas 使用的期间的最后一个日历日。
So I'd like the answer set to be:
所以我希望答案是:
Volume Adj Close High Low Close Open
Date
2012-12-28 392741900 136.92 142.56 139.54 142.41 140.64
2013-01-02 453638500 140.72 146.61 144.73 146.37 145.11
I could do this by converting the daily data to a python list, process the data and return the data to a dataframe, but how do can this be done with pandas?
我可以通过将每日数据转换为 python 列表、处理数据并将数据返回到数据框来做到这一点,但是如何用 Pandas 做到这一点?
回答by JAB
Instead of Myou can pass MSas the resample rule:
而不是M您可以MS作为重新采样规则传递:
df =pd.DataFrame( range(72), index = pd.date_range('1/1/2011', periods=72, freq='D'))
#df.resample('MS', how = 'mean') # pandas <0.18
df.resample('MS').mean() # pandas >= 0.18
Updated to use the first business day of the month respecting US Federal Holidays:
更新为使用与美国联邦假期相关的月份的第一个工作日:
df =pd.DataFrame( range(200), index = pd.date_range('12/1/2012', periods=200, freq='D'))
from pandas.tseries.offsets import CustomBusinessMonthBegin
from pandas.tseries.holiday import USFederalHolidayCalendar
bmth_us = CustomBusinessMonthBegin(calendar=USFederalHolidayCalendar())
df.resample(bmth_us).mean()
if you want custom starts of the month using the min month found in the data try this. (It isn't pretty, but it should work).
如果您想使用数据中找到的最小月份自定义月份开始,请尝试此操作。(它不漂亮,但它应该可以工作)。
month_index =df.index.to_period('M')
min_day_in_month_index = pd.to_datetime(df.set_index(new_index, append=True).reset_index(level=0).groupby(level=0)['level_0'].min())
custom_month_starts =CustomBusinessMonthBegin(calendar = min_day_in_month_index)
Pass custom_start_monthsto the fist parameter of resample
传递custom_start_months给的第一个参数resample
回答by user2766344
Thank you J Bradley, your solution worked perfectly. I did have to upgrade my version of pandas from their official website though as the version installed via pip did not have CustomBusinessMonthBegin in pandas.tseries.offsets. My final code was:
谢谢 J Bradley,您的解决方案非常有效。我确实必须从他们的官方网站升级我的 pandas 版本,因为通过 pip 安装的版本在 pandas.tseries.offsets 中没有 CustomBusinessMonthBegin。我的最终代码是:
#----- imports -----
import pandas as pd
from pandas.tseries.offsets import CustomBusinessMonthBegin
import pandas.io.data as web
#----- get sample data -----
df = web.get_data_yahoo('SPY', '2012-12-01', '2013-12-31')
#----- build custom calendar -----
month_index =df.index.to_period('M')
min_day_in_month_index = pd.to_datetime(df.set_index(month_index, append=True).reset_index(level=0).groupby(level=0)['Open'].min())
custom_month_starts = CustomBusinessMonthBegin(calendar = min_day_in_month_index)
#----- convert daily data to monthly data -----
ohlc_dict = {'Open':'first','High':'max','Low':'min','Close': 'last','Volume': 'sum','Adj Close': 'last'}
mthly_ohlcva = df.resample(custom_month_starts, how=ohlc_dict)
This yielded the following:
这产生了以下结果:
>>> mthly_ohlcva
Volume Adj Close High Low Close Open
Date
2012-12-03 2889875900 136.92 145.58 139.54 142.41 142.80
2013-01-01 2587140200 143.92 150.94 144.73 149.70 145.11
2013-02-01 2581459300 145.76 153.28 148.73 151.61 150.65
2013-03-01 2330972300 151.30 156.85 150.41 156.67 151.09
2013-04-01 2907035000 154.20 159.72 153.55 159.68 156.59
2013-05-01 2781596000 157.84 169.07 158.10 163.45 159.33
2013-06-03 3533321800 155.74 165.99 155.73 160.42 163.83
2013-07-01 2330904500 163.78 169.86 160.22 168.71 161.26
2013-08-01 2283131700 158.87 170.97 163.05 163.65 169.99
2013-09-02 2226749600 163.90 173.60 163.70 168.01 165.23
2013-10-01 2901739000 171.49 177.51 164.53 175.79 168.14
2013-11-01 1930952900 176.57 181.75 174.76 181.00 176.02
2013-12-02 2232775900 181.15 184.69 177.32 184.69 181.09
回答by Xoel López Barata
I've seen in the last version of pandas you can use time offset alias 'BMS', which stands for "business month start frequency" or 'BM', which stands for "business month end frequency".
我在最新版本的Pandas中看到您可以使用时间偏移别名“BMS”,它代表“营业月开始频率”或“BM”,它代表“营业月结束频率”。
The code in the first case would look like
第一种情况下的代码看起来像
data_dy.resample('BMS', closed='right', label='right').apply(ohlc_dict)
or, in the second case,
或者,在第二种情况下,
data_dy.resample('BM', closed='right', label='right').apply(ohlc_dict)

