pandas 从熊猫数据框中删除非工作日行

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/37803040/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 01:23:14  来源:igfitidea点击:

Remove non-business days rows from pandas dataframe

pythonpandas

提问by vandelay

I have a dataframe with second timeseries data of wheat in df.

我有一个包含小麦的第二个时间序列数据的数据框df

df = wt["WHEAT_USD"]

2016-05-02 02:00:00+02:00    4.780
2016-05-02 02:01:00+02:00    4.777
2016-05-02 02:02:00+02:00    4.780
2016-05-02 02:03:00+02:00    4.780
2016-05-02 02:04:00+02:00    4.780
Name: closeAsk, dtype: float64

When I plot the data it has theese annoying horizontal lines because of weekends. Are there any simple way of simply removing the non-business days from the dataframe itself.

当我绘制数据时,由于周末,它有这些烦人的水平线。是否有任何简单的方法可以简单地从数据帧本身中删除非工作日。

Something like

就像是

df = df.BDays()

回答by Andy Hayden

One simple solution is to slice out the days not in Monday to Friday:

一个简单的解决方案是切出不在周一到周五的日子:

In [11]: s[s.index.dayofweek < 5]
Out[11]:
2016-05-02 00:00:00    4.780
2016-05-02 00:01:00    4.777
2016-05-02 00:02:00    4.780
2016-05-02 00:03:00    4.780
2016-05-02 00:04:00    4.780
Name: closeAsk, dtype: float64

Note: this doesn't take into account bank holidays etc.

注意:这不考虑银行假期等。

回答by Dave Babbitt

Pandas BDayjust ends up using .dayofweek<5like the chosen answer, but can be extended to account for bank holidays, etc.

PandasBDay最终会.dayofweek<5像选择的答案一样使用,但可以扩展到考虑银行假期等。

import pandas as pd
from pandas.tseries.offsets import BDay

isBusinessDay = BDay().onOffset
csv_path = 'C:\Python27\Lib\site-packages\bokeh\sampledata\daylight_warsaw_2013.csv'
dates_df = pd.read_csv(csv_path)
match_series = pd.to_datetime(dates_df['Date']).map(isBusinessDay)
dates_df[match_series]

回答by Henri Frits maarseveen

I am building a backtester for stock/FX trading and I also have these issue with days that are nan because that they are holidays or other non trading days.. you can download a financial calendar for the days that there is no trading and then you need to think about timezone and weekends.. etc..

我正在为股票/外汇交易构建一个回测器,我也有这些天数的问题,因为它们是假期或其他非交易日。需要考虑时区和周末……等等。

But the best solution is not to use date/time as the index for the candles or price. So do not connect your price data to a date/time but just to a counter of candles or prices .. you can use a second index for this.. so for calculations of MA or other technical lines dont use date/time .. if you look at Metatrader 4/5 it also doesnt use date/time but the index of the data is the candle number !!

但最好的解决方案是不要使用日期/时间作为蜡烛或价格的索引。因此,不要将您的价格数据连接到日期/时间,而只是连接到蜡烛或价格的计数器.. 您可以为此使用第二个索引.. 所以对于 MA 或其他技术线的计算不要使用日期/时间.. 如果你看看 Metatrader 4/5 它也不使用日期/时间,但数据的索引是蜡烛号!!

I think that you need to let go of the date-time for the price if you work with stock or FX data , of cause you can put them in a column of the data-frame but dont use it as the index This way you can avoid many problems

我认为如果您使用股票或外汇数据,您需要放弃价格的日期时间,因为您可以将它们放在数据框的一列中,但不要将其用作索引 这样您就可以避免很多问题

回答by oherbage

using workdays, you can count for holidays pretty easily

使用工作日,您可以很容易地计算假期

    import workdays as wd

    def drop_non_busdays(df, holidays=None):
        if holidays is None:
            holidays = []
        start_date = df.index.to_list()[0].date()
        end_date = df.index.to_list()[-1].date()


        start_wd = wd.workday(wd.workday(start_date, -1, holidays), 1, holidays)
        end_wd = wd.workday(wd.workday(end_date, 1, holidays), -1, holidays)

        b_days = [start_wd]
        while b_days[-1] < end_wd:
            b_days.append(wd.workday(b_days[-1], 1, holidays))

        valid = [i in b_days for i in df.index]
        return df[valid]