通过 Python 中的 Pandas 将每日股票数据转换为每周股票数据

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/34597926/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 00:27:39  来源:igfitidea点击:

converting daily stock data to weekly-based via pandas in Python

pythonpandasyahoo-finance

提问by Judking

I've got a DataFramestoring daily-based data which is as below:

我有一个DataFrame存储的每日数据,如下所示:

Date              Open        High         Low       Close   Volume
2010-01-04   38.660000   39.299999   38.509998   39.279999  1293400   
2010-01-05   39.389999   39.520000   39.029999   39.430000  1261400   
2010-01-06   39.549999   40.700001   39.020000   40.250000  1879800   
2010-01-07   40.090000   40.349998   39.910000   40.090000   836400   
2010-01-08   40.139999   40.310001   39.720001   40.290001   654600   
2010-01-11   40.209999   40.520000   40.040001   40.290001   963600   
2010-01-12   40.160000   40.340000   39.279999   39.980000  1012800   
2010-01-13   39.930000   40.669998   39.709999   40.560001  1773400   
2010-01-14   40.490002   40.970001   40.189999   40.520000  1240600   
2010-01-15   40.570000   40.939999   40.099998   40.450001  1244200   

What I intend to do is to merge it into weekly-based data. After grouping:

我打算做的是将其合并为基于每周的数据。分组后:

  1. the Dateshould be every Monday (at this point, holidays scenario should be considered when Monday is not a trading day, we should apply the first trading day in current week as the Date).
  2. Openshould be Monday's (or the first trading day of current week) Open.
  3. Closeshould be Friday's (or the last trading day of current week) Close.
  4. Highshould be the highest High of trading days in current week.
  5. Lowshould be the lowest Low of trading days in current week.
  6. Volumnshould be the sum of all Volumes of trading days in current week.
  1. 日期应为每星期一(在这一点上,节日的场景应该星期一,不是交易日认为,我们应该运用第一个交易日在本周的日期)。
  2. 开盘时间应为周一(或本周的第一个交易日)开盘。
  3. 收盘价应为周五(或本周的最后一个交易日)收盘价。
  4. 最高价应该是本周交易日的最高价。
  5. Low应该是本周交易日的最低 Low。
  6. 成交量应为当周所有交易日成交量的总和。

which should look like this:

应该是这样的:

Date              Open        High         Low       Close   Volume
2010-01-04   38.660000   40.700001   38.509998   40.290001  5925600   
2010-01-11   40.209999   40.970001   39.279999   40.450001  6234600   

Currently, my code snippet is as below, which function should I use to mapping daily-based data to the expected weekly-based data? Many thanks!

目前,我的代码片段如下,我应该使用哪个函数将每日数据映射到预期的每周数据?非常感谢!

import pandas_datareader.data as web

start = datetime.datetime(2010, 1, 1)
end = datetime.datetime(2016, 12, 31)
f = web.DataReader("MNST", "yahoo", start, end, session=session)
print f

回答by Stefan

You can resample(to weekly), offset(shift), and applyaggregation rules as follows:

您可以resample(以每周)、offset(轮班)和apply聚合规则如下:

logic = {'Open'  : 'first',
         'High'  : 'max',
         'Low'   : 'min',
         'Close' : 'last',
         'Volume': 'sum'}

offset = pd.offsets.timedelta(days=-6)

f = pd.read_clipboard(parse_dates=['Date'], index_col=['Date'])
f.resample('W', loffset=offset).apply(logic)

to get:

要得到:

                 Open       High        Low      Close   Volume
Date                                                           
2010-01-04  38.660000  40.700001  38.509998  40.290001  5925600
2010-01-11  40.209999  40.970001  39.279999  40.450001  6234600

回答by goofd

In general, assuming that you have the dataframe in the form you specified, you need to do the following steps:

一般来说,假设您拥有指定格式的数据框,您需要执行以下步骤:

  1. put Datein the index
  2. resamplethe index.
  1. 放入Date索引
  2. resample指数。

What you have is a case of applying different functions to different columns. See.

您所拥有的是将不同功能应用于不同列的情况。

You can resample in various ways. for e.g. you can take the mean of the values or count or so on. check pandas resample.

您可以通过多种方式重新采样。例如,您可以取值的平均值或计数等。检查Pandas重新采样

You can also apply custom aggregators (check the same link). With that in mind, the code snippet for your case can be given as:

您还可以应用自定义聚合器(检查相同的链接)。考虑到这一点,您的案例的代码片段可以如下所示:

f['Date'] = pd.to_datetime(f['Date'])
f.set_index('Date', inplace=True)
f.sort_index(inplace=True)

def take_first(array_like):
    return array_like[0]

def take_last(array_like):
    return array_like[-1]

output = f.resample('W',                                 # Weekly resample
                    how={'Open': take_first, 
                         'High': 'max',
                         'Low': 'min',
                         'Close': take_last,
                         'Volume': 'sum'}, 
                    loffset=pd.offsets.timedelta(days=-6))  # to put the labels to Monday

output = output[['Open', 'High', 'Low', 'Close', 'Volume']]

Here, Wsignifies a weekly resampling which by default spans from Monday to Sunday. To keep the labels as Monday, loffsetis used. There are several predefined day specifiers. Take a look at pandas offsets. You can even define custom offsets (see).

在这里,W表示每周重新采样,默认情况下从周一到周日。loffset用于将标签保持为星期一。有几个预定义的日期说明符。看看pandas offsets。您甚至可以定义自定义偏移量(请参阅参考资料)。

Coming back to the resampling method. Here for Openand Closeyou can specify custom methods to take the first value or so on and pass the function handle to the howargument.

回到重采样方法。在这里OpenClose您可以指定自定义方法以获取第一个值等,并将函数句柄传递给how参数。

This answer is based on the assumption that the data seems to be daily, i.e. for each day you have only 1 entry. Also, no data is present for the non-business days. i.e. Sat and Sun. So taking the last data point for the week as the one for Friday is ok. If you so want you can use business week instead of 'W'. Also, for more complex data you may want to use groupbyto group the weekly data and then work on the time indices within them.

这个答案是基于数据似乎是每天的假设,即每天只有 1 个条目。此外,不存在非工作日的数据。即周六和周日。因此,将本周的最后一个数据点作为周五的数据点是可以的。如果您愿意,可以使用业务周而不是“W”。此外,对于更复杂的数据,您可能希望使用groupby对每周数据进行分组,然后处理其中的时间索引。

btw a gist for the solution can be found at: https://gist.github.com/prithwi/339f87bf9c3c37bb3188

顺便说一句,可以在以下位置找到解决方案的要点:https: //gist.github.com/prithwi/339f87bf9c3c37bb3188

回答by Matthew Scarborough

I had the exact same question and found a great solution here.

我有完全相同的问题,并在这里找到了一个很好的解决方案。

https://www.techtrekking.com/how-to-convert-daily-time-series-data-into-weekly-and-monthly-using-pandas-and-python/

https://www.techtrekking.com/how-to-convert-daily-time-series-data-into-weekly-and-monthly-using-pandas-and-python/

The weekly code is posted below.

每周代码发布在下面。

import pandas as pd
import numpy as np

print('*** Program Started ***')

df = pd.read_csv('15-06-2016-TO-14-06-2018HDFCBANKALLN.csv')

# ensuring only equity series is considered
df = df.loc[df['Series'] == 'EQ']

# Converting date to pandas datetime format
df['Date'] = pd.to_datetime(df['Date'])
# Getting week number
df['Week_Number'] = df['Date'].dt.week
# Getting year. Weeknum is common across years to we need to create unique index by using year and weeknum
df['Year'] = df['Date'].dt.year

# Grouping based on required values
df2 = df.groupby(['Year','Week_Number']).agg({'Open Price':'first', 'High Price':'max', 'Low Price':'min', 'Close Price':'last','Total Traded Quantity':'sum'})
# df3 = df.groupby(['Year','Week_Number']).agg({'Open Price':'first', 'High Price':'max', 'Low Price':'min', 'Close Price':'last','Total Traded Quantity':'sum','Average Price':'avg'})
df2.to_csv('Weekly_OHLC.csv')
print('*** Program ended ***')

回答by L. Astola

Not a direct answer, but suppose the columns are the dates (transpose of your table), without missing dates.

不是直接的答案,但假设列是日期(表格的转置),没有丢失日期。

'''sum up daily results in df to weekly results in wdf'''
wdf = pd.DataFrame(index = df.index)
for i in range(len(df.columns)):
    if (i!=0) & (i%7==0):
        wdf['week'+str(i//7)]= df[df.columns[i-7:i]].sum(axis = 1)