Pandas 删除时间范围之外的行

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/14539992/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 20:37:01  来源:igfitidea点击:

Pandas Drop Rows Outside of Time Range

pythonpandas

提问by Jeff

I am trying to go through every row in a DataFrame index and remove all rows that are not between a certain time.

我试图遍历 DataFrame 索引中的每一行并删除不在特定时间之间的所有行。

I have been looking for solutions but none of them separate the Date from the Time, and all I want to do is drop the rows that are outside of a Time range.

我一直在寻找解决方案,但没有一个将日期与时间分开,我想要做的就是删除时间范围之外的行。

回答by Andy Hayden

You can use the between_timefunction directly:

您可以between_time直接使用该功能:

ts.between_time(datetime.time(18), datetime.time(9), include_start=False, include_end=False)


Original answer:

原答案:

You can use the indexer_between_timeIndexmethod.

您可以使用该indexer_between_timeIndex方法。

For example, to includethose times between 9am and 6pm (inclusive):

例如,要包括上午 9 点到下午 6 点()之间的时间:

ts.ix[ts.index.indexer_between_time(datetime.time(9), datetime.time(18))]

to do the opposite and excludethose times between 6pm and 9am (exclusive):

做相反的事情并排除下午 6 点到上午 9 点之间的时间(独家):

ts.ix[ts.index.indexer_between_time(datetime.time(18), datetime.time(9),
                                    include_start=False, include_end=False)]

Note: indexer_between_time's arguments include_startand include_endare by default True, setting include_startto Falsemeans that datetimes whose time-part is precisely start_time(the first argument), in this case 6pm, will not be included.

注意:indexer_between_time的参数include_startinclude_end默认情况下True,设置include_startFalse意味着时间部分恰好是start_time(第一个参数)的日期时间,在这种情况下是下午 6 点,将不包括在内。

Example:

例子:

In [1]: rng = pd.date_range('1/1/2000', periods=24, freq='H')

In [2]: ts = pd.Series(pd.np.random.randn(len(rng)), index=rng)

In [3]: ts.ix[ts.index.indexer_between_time(datetime.time(10), datetime.time(14))] 
Out[3]: 
2000-01-01 10:00:00    1.312561
2000-01-01 11:00:00   -1.308502
2000-01-01 12:00:00   -0.515339
2000-01-01 13:00:00    1.536540
2000-01-01 14:00:00    0.108617

Note: the same syntax (using ix) works for a DataFrame:

注意:相同的语法(使用ix)适用于 DataFrame:

In [4]: df = pd.DataFrame(ts)

In [5]: df.ix[df.index.indexer_between_time(datetime.time(10), datetime.time(14))]
Out[5]: 
                            0
2000-01-03 10:00:00  1.312561
2000-01-03 11:00:00 -1.308502
2000-01-03 12:00:00 -0.515339
2000-01-03 13:00:00  1.536540
2000-01-03 14:00:00  0.108617

回答by Ivelin

You can also do:

你也可以这样做:

?rng = pd.date_range('1/1/2000', periods=24, freq='H')
ts = pd.Series(pd.np.random.randn(len(rng)), index=rng)
ts.ix[datetime.time(10):datetime.time(14)]
Out[4]: 
2000-01-01 10:00:00   -0.363420
2000-01-01 11:00:00   -0.979251
2000-01-01 12:00:00   -0.896648
2000-01-01 13:00:00   -0.051159
2000-01-01 14:00:00   -0.449192
Freq: H, dtype: float64

DataFrame works same way.

DataFrame 的工作方式相同。