使用 Pandas 过滤和比较日期

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/44162585/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 03:41:24  来源:igfitidea点击:

Filtering and comparing dates with Pandas

pandasdatetime

提问by Federico Garza

I would like to know how to filter different dates at all the different time levels, i.e. find dates by year, month, day, hour, minute and/or day. For example, how do I find all dates that happened in 2014 or 2014 in the month of January or only 2nd January 2014 or ...down to the second?

我想知道如何在所有不同的时间级别过滤不同的日期,即按年、月、日、小时、分钟和/或日查找日期。例如,我如何查找 2014 年或 2014 年发生在 1 月份的所有日期,或者仅在 2014 年 1 月 2 日或......直到第二个日期?

So I have my date and time dataframe generated from pd.to_datetime

所以我从pd.to_datetime生成了我的日期和时间数据

df
    timeStamp
0   2014-01-02 21:03:04
1   2014-02-02 21:03:05
2   2016-02-04 18:03:10

So if I filter by the year 2014 then I would have as output:

因此,如果我按 2014 年进行过滤,那么我的输出将是:

    timeStamp
0   2014-01-02 21:03:04
1   2014-02-02 21:03:05

Or as a different example I want to know the dates that happened in 2014 and at the 2nd of each month. This would also result in:

或者作为一个不同的例子,我想知道 2014 年和每个月的 2 号发生的日期。这也会导致:

    timeStamp
0   2014-01-02 21:03:04
1   2014-02-02 21:03:05

But if I asked for a date that happened on the 2nd of January 2014

但是,如果我要求日期发生在 2014 年 1 月 2 日

    timeStamp
0   2014-01-02 21:03:04

How can I achieve this at all the different levels?

我怎样才能在所有不同的层次上做到这一点?

Also how do you compare dates at these different levels to create an array of boolean indices?

另外,您如何比较这些不同级别的日期以创建布尔索引数组?

回答by Andrew L

You can filter your dataframe via boolean indexing like so:

您可以通过布尔索引过滤数据框,如下所示:

df.loc[df['timeStamp'].dt.year == 2014]
df.loc[df['timeStamp'].dt.month == 5]
df.loc[df['timeStamp'].dt.second == 4]
df.loc[df['timeStamp'] == '2014-01-02']
df.loc[pd.to_datetime(df['timeStamp'].dt.date) == '2014-01-02']

... and so on and so forth.

... 等等等等。

回答by Scott Boston

If you set timestamp as index and dtype as datetime to get a DateTimeIndex, then you can use the following Partial String Indexingsyntax:

如果将时间戳设置为索引并将 dtype 设置为日期时间以获取 DateTimeIndex,则可以使用以下Partial String Indexing语法:

df['2014'] # gets all 2014
df['2014-01'] # gets all Jan 2014
df['01-02-2014'] # gets all Jan 2, 2014

回答by Troy D

I would just create a string series, then use str.contains() with wildcards. That will give you whatever granularity you're looking for.

我只会创建一个字符串系列,然后使用带有通配符的 str.contains() 。这将为您提供您正在寻找的任何粒度。

s = df['timeStamp'].map(lambda x: x.strftime('%Y-%m-%d %H:%M:%S'))

print(df[s.str.contains('2014-..-.. ..:..:..')])
print(df[s.str.contains('2014-..-02 ..:..:..')])
print(df[s.str.contains('....-02-.. ..:..:..')])
print(df[s.str.contains('....-..-.. 18:03:10')])

Output:

输出:

        timeStamp
0 2014-01-02 21:03:04
1 2014-02-02 21:03:05
        timeStamp
0 2014-01-02 21:03:04
1 2014-02-02 21:03:05
        timeStamp
1 2014-02-02 21:03:05
2 2016-02-04 18:03:10
        timeStamp
2 2016-02-04 18:03:10

I think this also solves your question about boolean indices:

我认为这也解决了您关于布尔索引的问题:

print(s.str.contains('....-..-.. 18:03:10'))

Output:

输出:

0    False
1    False
2     True
Name: timeStamp, dtype: bool