Python 使用 Pandas 将每日数据重新采样为每月(日期格式)
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/42191697/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Resample Daily Data to Monthly with Pandas (date formatting)
提问by sslack88
I am trying to resample some data from daily to monthly in a Pandas DataFrame. I am new to pandas and maybe I need to format the date and time first before I can do this, but I am not finding a good tutorial out there on the correct way to work with imported time series data. Everything I find is automatically importing data from Yahoo or Quandl.
我正在尝试在 Pandas DataFrame 中从每日到每月重新采样一些数据。我是 Pandas 的新手,也许我需要先格式化日期和时间,然后才能执行此操作,但是我没有找到有关使用导入的时间序列数据的正确方法的好教程。我发现的一切都是自动从 Yahoo 或 Quandl 导入数据。
Here is what I have in my DataFrame: dataframe segment screenshot
这是我的 DataFrame 中的内容:dataframe segment screenshot
Here is the code I used to create my DataFrame:
这是我用来创建 DataFrame 的代码:
#Import excel file into a Pandas DataFrame
df = pd.read_excel(open('2016_forex_daily_returns.xlsx','rb'), sheetname='Sheet 1')
#Calculate the daily returns
df['daily_ret'] = df['Equity'].pct_change()
# Assume an average annual risk-free rate over the period of 5%
df['excess_daily_ret'] = df['daily_ret'] - 0.05/252
Can someone help me understand what I need to do with the "Date" and "Time" columns in my DataFrame so I can resample?
有人可以帮助我了解我需要对 DataFrame 中的“日期”和“时间”列做什么,以便我可以重新采样?
回答by jezrael
For create DataFrame
is possible use:
对于 createDataFrame
可以使用:
df = pd.read_excel('2016_forex_daily_returns.xlsx', sheetname='Sheet 1')
print (df)
Date Time Equity
0 2016-01-03 22:16:22 300.38
1 2016-01-04 22:16:00 300.65
2 2016-01-05 14:26:02 301.65
3 2016-01-06 19:08:13 302.10
4 2016-01-07 18:39:00 302.55
5 2016-01-08 22:16:04 308.24
6 2016-01-11 02:49:39 306.69
7 2016-01-14 15:46:39 307.93
8 2016-01-19 15:56:31 308.18
I think you can first cast to_datetime
column date
and then use resample
with some aggregating functions like sum
or mean
:
我认为您可以先转换to_datetime
列date
,然后使用resample
一些聚合函数,例如sum
or mean
:
df.Date = pd.to_datetime(df.Date)
df1 = df.resample('M', on='Date').sum()
print (df1)
Equity excess_daily_ret
Date
2016-01-31 2738.37 0.024252
df2 = df.resample('M', on='Date').mean()
print (df2)
Equity excess_daily_ret
Date
2016-01-31 304.263333 0.003032
df3 = df.set_index('Date').resample('M').mean()
print (df3)
Equity excess_daily_ret
Date
2016-01-31 304.263333 0.003032
回答by Josmoor98
To resample from daily data to monthly, you can use the resample
method. Specifically for daily returns, the example below demonstrates a possible solution.
要将每日数据重新采样为每月数据,您可以使用该resample
方法。特别是对于每日回报,以下示例展示了一种可能的解决方案。
The following data is taken from an analysis performed by AQR. It represents the market daily returns for May, 2019. The following code may be used to construct the data as a pd.DataFrame
.
以下数据取自AQR执行的分析。它代表 2019 年 5 月的市场每日收益。以下代码可用于将数据构造为pd.DataFrame
.
import pandas as pd
dates = pd.DatetimeIndex(['2019-05-01', '2019-05-02', '2019-05-03', '2019-05-06',
'2019-05-07', '2019-05-08', '2019-05-09', '2019-05-10',
'2019-05-13', '2019-05-14', '2019-05-15', '2019-05-16',
'2019-05-17', '2019-05-20', '2019-05-21', '2019-05-22',
'2019-05-23', '2019-05-24', '2019-05-27', '2019-05-28',
'2019-05-29', '2019-05-30', '2019-05-31'],
dtype='datetime64[ns]', name='DATE', freq=None)
daily_returns = array([-7.73787813e-03, -1.73277604e-03, 1.09124031e-02, -3.80437796e-03,
-1.66513456e-02, -1.67262934e-03, -2.77427734e-03, 4.01713274e-03,
-2.50407102e-02, 9.23270367e-03, 5.41897568e-03, 8.65419524e-03,
-6.83456209e-03, -6.54787106e-03, 9.04322511e-03, -4.05811322e-03,
-1.33152640e-02, 2.73398876e-03, -9.52000000e-05, -7.91438809e-03,
-7.16881982e-03, 1.19255102e-03, -1.24209547e-02])
daily_returns = pd.DataFrame(index = index, data= may.values, columns = ["returns"])
Assuming you don't have daily price data, you can resample from daily returns to monthly returns using the following code.
假设您没有每日价格数据,您可以使用以下代码从每日回报重新采样为每月回报。
>>> daily_returns.resample("M").apply(lambda x: ((x + 1).cumprod() - 1).last("D"))
-0.06532
If you refer to their monthly dataset, this confirms that the market return for May 2019 was approximated to be -6.52%
or -0.06532
.
如果您参考他们的月度数据集,这证实了 2019 年 5 月的市场回报近似为-6.52%
或-0.06532
。
回答by dernk
I have created a random DataFrame similar to yours here:
我在这里创建了一个类似于你的随机数据帧:
import numpy as np
import pandas as pd
dates = [x for x in pd.date_range(end=pd.datetime.today(), periods=1800)]
counts = [x for x in np.random.randint(0, 10000, size=1800)]
df = pd.DataFrame({'dates': dates, 'counts': counts}).set_index('dates')
Here are the procedures to aggregate the sum of counts for each week as an example:
以下是聚合每周计数总和的过程作为示例:
df['week'] = df.index.week
df['year'] = df.index.year
target_df = df.groupby(['year', 'week']).agg({'counts': np.sum})
Where the output of target_df is:
其中 target_df 的输出是:
counts
year week
2015 3 29877
4 36859
5 36872
6 36899
7 37769
. . .
. . .
. . .
回答by Saeid
First, concatenate the 'Date' and 'Time' columns with space in between. Then convert that into a DateTime format using pd.to_datetime().
首先,连接“日期”和“时间”列,中间有空格。然后使用 pd.to_datetime() 将其转换为 DateTime 格式。
df = pd.read_excel('2016_forex_daily_returns.xlsx', sheetname='Sheet 1')
print(df)
Date Time Equity
0 2016-01-03 22:16:22 300.38
1 2016-01-04 22:16:00 300.65
2 2016-01-05 14:26:02 301.65
3 2016-01-06 19:08:13 302.10
4 2016-01-07 18:39:00 302.55
5 2016-01-08 22:16:04 308.24
6 2016-01-11 02:49:39 306.69
7 2016-01-14 15:46:39 307.93
8 2016-01-19 15:56:31 308.18
df = df.drop(['Date', 'Time'], axis= 'columns').set_index(pd.to_datetime(df.Date + ' ' + df.Time))
df.index.name = 'Date/Time'
print(df)
Equity
Date/Time
2016-01-03 22:16:22 300.38
2016-01-04 22:16:00 300.65
2016-01-05 14:26:02 301.65
2016-01-06 19:08:13 302.10
2016-01-07 18:39:00 302.55
2016-01-08 22:16:04 308.24
2016-01-11 02:49:39 306.69
2016-01-14 15:46:39 307.93
2016-01-19 15:56:31 308.18
Now you can resample to any format you desire.
现在您可以重新采样为您想要的任何格式。