Pandas 验证日期格式
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/49435438/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas validate date format
提问by lukas_o
Is there any nice way to validate that all items in a dataframe's column have a valid date format?
有什么好的方法可以验证数据框列中的所有项目都具有有效的日期格式吗?
My date format is 11-Aug-2010
.
我的日期格式是11-Aug-2010
.
I saw this generic answer, where:
我看到了这个通用答案,其中:
try:
datetime.datetime.strptime(date_text, '%Y-%m-%d')
except ValueError:
raise ValueError("Incorrect data format, should be YYYY-MM-DD")
source: https://stackoverflow.com/a/16870699/1374488
来源:https: //stackoverflow.com/a/16870699/1374488
But I assume that's not good (efficient) in my case.
但我认为这在我的情况下不好(有效)。
I assume I have to modify the strings to be pandas dates first as mentioned here: Convert string date time to pandas datetime
我假设我必须首先将字符串修改为Pandas日期,如下所述: Convert string date time to pandas datetime
I am new to the Python world, any ideas appreciated.
我是 Python 世界的新手,任何想法都值得赞赏。
回答by cs95
(format borrowed from piRSquared's answer)
(格式借自 piRSquared 的回答)
if pd.to_datetime(df['date'], format='%d-%b-%Y', errors='coerce').notnull().all():
# do something
This is the LYBL—"Look Before You Leap" approach. This will return True
assuming all your date strings are valid - meaning they are all converted into actual pd.Timestamp
objects. Invalid date strings are coerced to NaT
, which is the datetime equivalent of NaN
.
这就是 LYBL——“跳前先看”的方法。True
假设您的所有日期字符串都有效,这将返回- 这意味着它们都被转换为实际pd.Timestamp
对象。无效的日期字符串被强制为NaT
,它是 的日期时间等价物NaN
。
Alternatively,
或者,
try:
pd.to_datetime(df['date'], format='%d-%b-%Y', errors='raise')
# do something
except ValueError:
pass
This is the EAFP—"Easier to Ask Forgiveness than Permission" approach, a ValueError
is raised when invalid date strings are encountered.
这是 EAFP——“请求宽恕比许可更容易”方法,ValueError
当遇到无效日期字符串时会引发。
回答by piRSquared
If you know your format, you can use boolean slicing
如果你知道你的格式,你可以使用布尔切片
mask = pd.to_datetime(df.columns, format='%d-%b-%Y', errors='coerce').notna()
df.loc[:, mask]
Consider the dataframe df
考虑数据框 df
df = pd.DataFrame(1, range(1), ['11-Aug-2010', 'August2010, I think', 1])
df
11-Aug-2010 August2010, I think 1
0 1 1 1
I can filter with
我可以过滤
mask = pd.to_datetime(df.columns, format='%d-%b-%Y', errors='coerce').notna()
df.loc[:, mask]
11-Aug-2010
0 1