pandas 如何使用strptime将浮点数/整数转换为日期?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/34767817/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 00:30:12  来源:igfitidea点击:

how to convert a float/integer into a date with strptime?

pythondatetimepandas

提问by ??????

I have a pandas dataframe that contains the following columns:

我有一个包含以下列的Pandas数据框:

col1 col2
20040929 NaN
NaN 20040925 

that is both both col1 and col2 are float64 (or int64) numbers. I am trying to convert these using datetime.strptime() but I get the error

即 col1 和 col2 都是 float64(或 int64)数字。我正在尝试使用 datetime.strptime() 转换这些,但出现错误

"cannot convert the series to type 'float'"

“无法将系列转换为‘float’类型”

and if I convert them to float, I get something like 20040929.0 which strptime does not understand.

如果我将它们转换为浮点数,则会得到 strptime 无法理解的类似 20040929.0 的信息。

How can I transform these columns into date then? Many thanks

那么如何将这些列转换为日期呢?非常感谢

回答by EdChum

you can convert the df to strusing astypeand then applyto_datetimewith format string:

您可以将 df 转换为strusingastype然后使用格式字符串:applyto_datetime

In [190]:
df.astype(str).apply(lambda x: pd.to_datetime(x, format='%Y%m%d'))

Out[190]:
        col1       col2
0 2004-09-29        NaT
1        NaT 2004-09-25

EDIT

编辑

using strptimewill be slower and less friendly, firstly converting to strintroduces .0as the dtype is float, we have to split on this, additionally strptimedoesn't understand Seriesso we have to call applymap. On top of this NaNwill cause strptimeto bork so we have to do the following:

usingstrptime会更慢而且不太友好,首先转换为str引入,.0因为 dtype 是浮点数,我们必须对此进行拆分,另外strptime不明白Series所以我们必须调用applymap. 最重要的是,这NaN将导致strptimebork,因此我们必须执行以下操作:

In [203]:
def func(x):
    try:
        return dt.datetime.strptime(x.split('.')[0], '%Y%m%d')
    except:
        return pd.NaT
df.astype(str).applymap(func)

Out[203]:
        col1       col2
0 2004-09-29        NaT
1        NaT 2004-09-25

Timings

时间安排

If we compare the 2 methods on a 2K row df:

如果我们在 2K 行 df 上比较这两种方法:

In [212]:
%timeit df.astype(str).apply(lambda x: pd.to_datetime(x, format='%Y%m%d'))
100 loops, best of 3: 8.11 ms per loop

In [213]:    
%%timeit 
def func(x):
    try:
        return dt.datetime.strptime(x.split('.')[0], '%Y%m%d')
    except:
        return pd.NaT
df.astype(str).applymap(func)

10 loops, best of 3: 86.3 ms per loop

We observe that the pandasmethod is over 10X faster, it's likely that it scales much better as it's vectorised

我们观察到该pandas方法的速度提高了 10 倍以上,很可能在矢量化后它的扩展性更好