Pandas 验证日期格式

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/49435438/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 05:21:58  来源:igfitidea点击:

Pandas validate date format

pythonpandasdatetime

提问by lukas_o

Is there any nice way to validate that all items in a dataframe's column have a valid date format?

有什么好的方法可以验证数据框列中的所有项目都具有有效的日期格式吗?

My date format is 11-Aug-2010.

我的日期格式是11-Aug-2010.

I saw this generic answer, where:

我看到了这个通用答案,其中:

try:
    datetime.datetime.strptime(date_text, '%Y-%m-%d')
except ValueError:
    raise ValueError("Incorrect data format, should be YYYY-MM-DD")

source: https://stackoverflow.com/a/16870699/1374488

来源:https: //stackoverflow.com/a/16870699/1374488

But I assume that's not good (efficient) in my case.

但我认为这在我的情况下不好(有效)。

I assume I have to modify the strings to be pandas dates first as mentioned here: Convert string date time to pandas datetime

我假设我必须首先将字符串修改为Pandas日期,如下所述: Convert string date time to pandas datetime

I am new to the Python world, any ideas appreciated.

我是 Python 世界的新手,任何想法都值得赞赏。

回答by cs95

(format borrowed from piRSquared's answer)

(格式借自 piRSquared 的回答)

if pd.to_datetime(df['date'], format='%d-%b-%Y', errors='coerce').notnull().all():
    # do something 

This is the LYBL—"Look Before You Leap" approach. This will return Trueassuming all your date strings are valid - meaning they are all converted into actual pd.Timestampobjects. Invalid date strings are coerced to NaT, which is the datetime equivalent of NaN.

这就是 LYBL——“跳前先看”的方法。True假设您的所有日期字符串都有效,这将返回- 这意味着它们都被转换为实际pd.Timestamp对象。无效的日期字符串被强制为NaT,它是 的日期时间等价物NaN

Alternatively,

或者,

try:
    pd.to_datetime(df['date'], format='%d-%b-%Y', errors='raise')
    # do something
except ValueError:
    pass

This is the EAFP—"Easier to Ask Forgiveness than Permission" approach, a ValueErroris raised when invalid date strings are encountered.

这是 EAFP——“请求宽恕比许可更容易”方法,ValueError当遇到无效日期字符串时会引发。

回答by piRSquared

If you know your format, you can use boolean slicing

如果你知道你的格式,你可以使用布尔切片

mask = pd.to_datetime(df.columns, format='%d-%b-%Y', errors='coerce').notna()
df.loc[:, mask]

Consider the dataframe df

考虑数据框 df

df = pd.DataFrame(1, range(1), ['11-Aug-2010', 'August2010, I think', 1])
df

   11-Aug-2010  August2010, I think  1
0            1                    1  1

I can filter with

我可以过滤

mask = pd.to_datetime(df.columns, format='%d-%b-%Y', errors='coerce').notna()
df.loc[:, mask]

   11-Aug-2010
0            1