Python熊猫整数YYYYMMDD到日期时间
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/27506367/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python pandas integer YYYYMMDD to datetime
提问by Rookie
Apologies in advance for this, but after two hours of searching and trying I cannot get the right answer here. I have a data frame, populated via pandas io sql.read_frame().
The column that is proving to be too much for me is of dtypeint64. The integers is of the format YYYYMMDD. for example 20070530- 30th of may 2007. I have tried a range of approaches, the most obvious being;
提前为此道歉,但经过两个小时的搜索和尝试后,我无法在这里得到正确的答案。我有一个数据框,通过 pandas io sql.read_frame() 填充。事实证明对我来说太多的专栏是dtypeint64. 整数的格式为YYYYMMDD。例如20070530- 2007 年 5 月 30 日。我尝试了一系列方法,最明显的是;
pd.to_datetime(dt['Date'])and pd.to_datetime(str(dt['Date']))
pd.to_datetime(dt['Date'])和 pd.to_datetime(str(dt['Date']))
with multiple variations on the functions different parameters.
功能不同的参数有多种变化。
The result has been, at best, that the date interpreted as being the time. The date is set to 1970-01-01- outcome as per above example 1970-01-01 00:00:00.020070530
结果充其量只是将日期解释为时间。日期设置为1970-01-01- 结果如上例1970-01-01 00:00:00.020070530
I also tried various .map()functions found in simular posts.
我还尝试了.map()在模拟帖子中找到的各种功能。
I have noticed that according to np.date_range()can interpret string values of the format YYYYMMDD, but that is the closest I have come to seeing a solution.
我注意到根据np.date_range()可以解释格式的字符串值YYYYMMDD,但这是我最接近解决方案的方法。
If anyone has an answer, I would be very greatful!
如果有人有答案,我将非常感激!
EDIT: In view of the answer from Ed Chum, the problem is most likely related to encoding. rep()on a subset of the dataFrame yields:
编辑:鉴于 Ed Chum 的回答,问题很可能与编码有关。rep()在数据帧的一个子集上产生:
OrdNo LstInvDt\n0
9 20070620\n1
11 20070830\n2
19 20070719\n3
21 20070719\n4
23 20070719\n5
26 20070911\n7
29 20070918\n8
31 0070816\n9
34 20070925\n10
OrdNo LstInvDt\n0
9 20070620\n1
11 20070830\n2
19 20070719\n3
21 20070719\n4
23 20070719\n5
26 20070709\ n5 26 20070709\
n2070719\
n20709\
n20709\ n20709\ n20709\ n20709\n7
This is when LstInvDtis dtype int64.
这是LstInvDtdtype int64 的时候。
采纳答案by EdChum
to_datetimeaccepts a format string:
to_datetime接受格式字符串:
In [92]:
t = 20070530
pd.to_datetime(str(t), format='%Y%m%d')
Out[92]:
Timestamp('2007-05-30 00:00:00')
example:
例子:
In [94]:
t = 20070530
df = pd.DataFrame({'date':[t]*10})
df
Out[94]:
date
0 20070530
1 20070530
2 20070530
3 20070530
4 20070530
5 20070530
6 20070530
7 20070530
8 20070530
9 20070530
In [98]:
df['DateTime'] = df['date'].apply(lambda x: pd.to_datetime(str(x), format='%Y%m%d'))
df
Out[98]:
date DateTime
0 20070530 2007-05-30
1 20070530 2007-05-30
2 20070530 2007-05-30
3 20070530 2007-05-30
4 20070530 2007-05-30
5 20070530 2007-05-30
6 20070530 2007-05-30
7 20070530 2007-05-30
8 20070530 2007-05-30
9 20070530 2007-05-30
In [99]:
df.dtypes
Out[99]:
date int64
DateTime datetime64[ns]
dtype: object
EDIT
编辑
Actually it's quicker to convert the type to string and then convert the entire series to a datetime rather than calling apply on every value:
实际上,将类型转换为字符串然后将整个系列转换为日期时间比对每个值调用 apply 更快:
In [102]:
df['DateTime'] = pd.to_datetime(df['date'].astype(str), format='%Y%m%d')
df
Out[102]:
date DateTime
0 20070530 2007-05-30
1 20070530 2007-05-30
2 20070530 2007-05-30
3 20070530 2007-05-30
4 20070530 2007-05-30
5 20070530 2007-05-30
6 20070530 2007-05-30
7 20070530 2007-05-30
8 20070530 2007-05-30
9 20070530 2007-05-30
timings
时间
In [104]:
%timeit df['date'].apply(lambda x: pd.to_datetime(str(x), format='%Y%m%d'))
100 loops, best of 3: 2.55 ms per loop
In [105]:
%timeit pd.to_datetime(df['date'].astype(str), format='%Y%m%d')
1000 loops, best of 3: 396 μs per loop

