Python 熊猫过滤和比较日期

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/36104500/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 17:24:33  来源:igfitidea点击:

pandas filtering and comparing dates

pythonpandas

提问by jake wong

I have a sql file which consists of the data below which I read into pandas.

我有一个 sql 文件,其中包含我读入熊猫的数据。

df = pandas.read_sql('Database count details', con=engine,
                     index_col='id', parse_dates='newest_available_date')

Output

输出

id       code   newest_date_available
9793708  3514   2015-12-24
9792282  2399   2015-12-25
9797602  7452   2015-12-25
9804367  9736   2016-01-20
9804438  9870   2016-01-20

The next line of code is to get last week's date

下一行代码是获取上周的日期

date_before = datetime.date.today() - datetime.timedelta(days=7) # Which is 2016-01-20

What I am trying to do is, to compare date_beforewith dfand print out all rows that is less than date_before

我所试图做的是,比较date_beforedf并打印出所有行小于date_before

if (df['newest_available_date'] < date_before): print(#all rows)

if (df['newest_available_date'] < date_before): print(#all rows)

Obviously this returns me an error The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

显然这会返回一个错误 The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

How should I do this?

我该怎么做?

回答by Fabio Lamanna

I would do a mask like:

我会做一个面具,如:

a = df[df['newest_date_available'] < date_before]

If date_before = datetime.date(2016, 1, 19), this returns:

如果date_before = datetime.date(2016, 1, 19),则返回:

        id  code newest_date_available
0  9793708  3514            2015-12-24
1  9792282  2399            2015-12-25
2  9797602  7452            2015-12-25

回答by onlyphantom

Using datetime.date(2019, 1, 10)works because pandascoerce the date to a date time under the hood. This however, will no longer be the case in future versions of pandas.

使用datetime.date(2019, 1, 10)作品是因为pandas将日期强制为引擎盖下的日期时间。但是,在未来版本的pandas.

From version 0.24 and up, it now issue a warning:

从 0.24 及更高版本开始,它现在发出警告:

FutureWarning: Comparing Series of datetimes with 'datetime.date'. Currently, the 'datetime.date' is coerced to a datetime. In the future pandas will not coerce, and a TypeError will be raised.

FutureWarning:将日期时间系列与“datetime.date”进行比较。目前,“datetime.date”被强制为日期时间。将来熊猫不会强制,并且会引发 TypeError。

The better solution is the one proposed on its official documentationas Pandasreplacement for python datetime.datetime object.

更好的解决方案是提出了一个官方的文档作为Pandas替代蟒蛇datetime.datetime对象

To provide an example referencing OP's initial dataset, this is how you would use it:

为了提供一个引用 OP 的初始数据集的示例,您将如何使用它:

import pandas
cond1 = df.newest_date_available < pd.Timestamp(2016,1,10)
df.loc[cond1, ]