pandas Python - 从数据框熊猫中检索过去 30 天的数据

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/33872129/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 00:15:59  来源:igfitidea点击:

Python - Retrieving last 30 days data from dataframe pandas

pythonpandasdataframe

提问by Naive Babes

I've a dataframe containing six month error logs, collected every day. I want to retrieve the last 30 days records from the last date. Last date isn't today.
For example: I've data for the months May, June, July and until August 15, I want to retrieve that data from August 15to July 15making it 30 days records.
Is there a way to do this in Python Pandas?

我有一个包含六个月错误日志的数据框,每天收集。我想从最后一个日期检索过去 30 天的记录。最后一次约会不是今天。
例如:我有五月、六月、七月和直到的数据August 15,我想从中检索该数据August 15July 15使其成为 30 天的记录。
有没有办法在 Python Pandas 中做到这一点?

This is the sample dataframe:

这是示例数据框:

Error_Description         Date        Weekend      Type
N17739 Limit switch X-    5/1/2015    5/3/2015    Critical
N17739 Limit switch Y-    5/1/2015    5/3/2015    Critical
N938 Key non-functional   5/1/2015    5/3/2015    Non-Critical
P124 Magazine is running  5/1/2015    5/3/2015    Non-Critical
N17738 Limit switch Z+    5/1/2015    5/3/2015    Critical
N938 Key non-functional   5/1/2015    5/3/2015    Non-Critical
     ...                    ...         ...          ...
P873 ENCLOSURE DOOR       8/24/2015   8/30/2015   Non-Critical
N3065 Reset M114          8/24/2015   8/30/2015   Non-Critical
N3065 Reset M114,         8/24/2015   8/30/2015   Non-Critical
N2853 Synchronization     8/24/2015   8/30/2015   Critical
P152 ENCLOSURE            8/24/2015   8/30/2015   Non-Critical
N6236 has stopped         8/24/2015   8/30/2015   Critical

采纳答案by jezrael

Date lastdayfromis used for selecting last 30 days of DataFrameby function loc.

日期lastdayfrom用于DataFrame通过函数loc选择过去 30 天。

lastdayfrom = pd.to_datetime('8/24/2015')
print lastdayfrom
#2015-08-24 00:00:00

print df
#           Error_Description       Date    Weekend          Type
#0     N17739 Limit switch X- 2015-05-01 2015-05-03      Critical
#1     N17739 Limit switch Y- 2015-05-01 2015-05-03      Critical
#2    N938 Key non-functional 2015-05-01 2015-05-03  Non-Critical
#3   P124 Magazine is running 2015-05-01 2015-05-03  Non-Critical
#4     N17738 Limit switch Z+ 2015-02-01 2015-05-03      Critical
#5    N938 Key non-functional 2015-07-25 2015-05-03  Non-Critical
#6        P873 ENCLOSURE DOOR 2015-07-24 2015-08-30  Non-Critical
#7           N3065 Reset M114 2015-07-21 2015-08-21  Non-Critical
#8          N3065 Reset M114, 2015-08-22 2015-08-22  Non-Critical
#9      N2853 Synchronization 2015-08-23 2015-08-30      Critical
#10            P152 ENCLOSURE 2015-08-24 2015-08-30  Non-Critical
#11         N6236 has stopped 2015-08-24 2015-08-30      Critical

print df.dtypes
#Error_Description            object
#Date                 datetime64[ns]
#Weekend              datetime64[ns]
#Type                         object
#dtype: object

#set index from column Date
df = df.set_index('Date')
#if datetimeindex isn't order, order it
df= df.sort_index()

#last 30 days of date lastday
df = df.loc[lastdayfrom - pd.Timedelta(days=30):lastdayfrom].reset_index()
print df
#        Date      Error_Description    Weekend          Type
#0 2015-07-25       N3065 Reset M114 2015-08-21  Non-Critical
#1 2015-08-22      N3065 Reset M114, 2015-08-22  Non-Critical
#2 2015-08-23  N2853 Synchronization 2015-08-30      Critical
#3 2015-08-24         P152 ENCLOSURE 2015-08-30  Non-Critical
#4 2015-08-24      N6236 has stopped 2015-08-30      Critical

回答by faltarell

You can use DataFrame.last_valid_index()to find the label of the last line, and then subtract DateOffset(30, 'D')to go back 30 days:

您可以使用DataFrame.last_valid_index()找到最后一行的标签,然后减去DateOffset(30, 'D')返回 30 天:

df[df.last_valid_index()-pandas.DateOffset(30, 'D'):]

回答by Chris

The other two answers (currently) assume the date is the index, but in python3 at least, you can solve this with just simple masking (.query(..)doesn't work).

另外两个答案(当前)假设日期是索引,但至少在 python3 中,您可以通过简单的掩码来解决这个问题(.query(..)不起作用)。

df[df["Date"] >= (pd.to_datetime('8/24/2015') - pd.Timedelta(days=30))]

df[df["Date"] >= (pd.to_datetime('8/24/2015') - pd.Timedelta(days=30))]