pandas 如何使用strptime将浮点数/整数转换为日期?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/34767817/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
how to convert a float/integer into a date with strptime?
提问by ??????
I have a pandas dataframe that contains the following columns:
我有一个包含以下列的Pandas数据框:
col1 col2
20040929 NaN
NaN 20040925
that is both both col1 and col2 are float64 (or int64) numbers. I am trying to convert these using datetime.strptime() but I get the error
即 col1 和 col2 都是 float64(或 int64)数字。我正在尝试使用 datetime.strptime() 转换这些,但出现错误
"cannot convert the series to type 'float'"
“无法将系列转换为‘float’类型”
and if I convert them to float, I get something like 20040929.0 which strptime does not understand.
如果我将它们转换为浮点数,则会得到 strptime 无法理解的类似 20040929.0 的信息。
How can I transform these columns into date then? Many thanks
那么如何将这些列转换为日期呢?非常感谢
回答by EdChum
you can convert the df to str
using astype
and then apply
to_datetime
with format string:
您可以将 df 转换为str
usingastype
然后使用格式字符串:apply
to_datetime
In [190]:
df.astype(str).apply(lambda x: pd.to_datetime(x, format='%Y%m%d'))
Out[190]:
col1 col2
0 2004-09-29 NaT
1 NaT 2004-09-25
EDIT
编辑
using strptime
will be slower and less friendly, firstly converting to str
introduces .0
as the dtype is float, we have to split on this, additionally strptime
doesn't understand Series
so we have to call applymap
. On top of this NaN
will cause strptime
to bork so we have to do the following:
usingstrptime
会更慢而且不太友好,首先转换为str
引入,.0
因为 dtype 是浮点数,我们必须对此进行拆分,另外strptime
不明白Series
所以我们必须调用applymap
. 最重要的是,这NaN
将导致strptime
bork,因此我们必须执行以下操作:
In [203]:
def func(x):
try:
return dt.datetime.strptime(x.split('.')[0], '%Y%m%d')
except:
return pd.NaT
df.astype(str).applymap(func)
Out[203]:
col1 col2
0 2004-09-29 NaT
1 NaT 2004-09-25
Timings
时间安排
If we compare the 2 methods on a 2K row df:
如果我们在 2K 行 df 上比较这两种方法:
In [212]:
%timeit df.astype(str).apply(lambda x: pd.to_datetime(x, format='%Y%m%d'))
100 loops, best of 3: 8.11 ms per loop
In [213]:
%%timeit
def func(x):
try:
return dt.datetime.strptime(x.split('.')[0], '%Y%m%d')
except:
return pd.NaT
df.astype(str).applymap(func)
10 loops, best of 3: 86.3 ms per loop
We observe that the pandas
method is over 10X faster, it's likely that it scales much better as it's vectorised
我们观察到该pandas
方法的速度提高了 10 倍以上,很可能在矢量化后它的扩展性更好