Python熊猫整数YYYYMMDD到日期时间

Question

提问by Rookie

Apologies in advance for this, but after two hours of searching and trying I cannot get the right answer here. I have a data frame, populated via pandas io sql.read_frame(). The column that is proving to be too much for me is of dtypeint64. The integers is of the format YYYYMMDD. for example 20070530- 30th of may 2007. I have tried a range of approaches, the most obvious being;

提前为此道歉，但经过两个小时的搜索和尝试后，我无法在这里得到正确的答案。我有一个数据框，通过 pandas io sql.read_frame() 填充。事实证明对我来说太多的专栏是dtypeint64. 整数的格式为YYYYMMDD。例如20070530- 2007 年 5 月 30 日。我尝试了一系列方法，最明显的是；

pd.to_datetime(dt['Date'])and pd.to_datetime(str(dt['Date']))

pd.to_datetime(dt['Date'])和 pd.to_datetime(str(dt['Date']))

with multiple variations on the functions different parameters.

功能不同的参数有多种变化。

The result has been, at best, that the date interpreted as being the time. The date is set to 1970-01-01- outcome as per above example 1970-01-01 00:00:00.020070530

结果充其量只是将日期解释为时间。日期设置为1970-01-01- 结果如上例1970-01-01 00:00:00.020070530

I also tried various .map()functions found in simular posts.

我还尝试了.map()在模拟帖子中找到的各种功能。

I have noticed that according to np.date_range()can interpret string values of the format YYYYMMDD, but that is the closest I have come to seeing a solution.

我注意到根据np.date_range()可以解释格式的字符串值YYYYMMDD，但这是我最接近解决方案的方法。

If anyone has an answer, I would be very greatful!

如果有人有答案，我将非常感激！

EDIT: In view of the answer from Ed Chum, the problem is most likely related to encoding. rep()on a subset of the dataFrame yields:

编辑：鉴于 Ed Chum 的回答，问题很可能与编码有关。rep()在数据帧的一个子集上产生：

OrdNo LstInvDt\n0
9 20070620\n1
11 20070830\n2
19 20070719\n3
21 20070719\n4
23 20070719\n5
26 20070911\n7
29 20070918\n8
31 0070816\n9
34 20070925\n10

OrdNo LstInvDt\n0
9 20070620\n1
11 20070830\n2
19 20070719\n3
21 20070719\n4
23 20070719\n5
26 20070709\ n5 26 20070709\
n2070719\
n20709\
n20709\ n20709\ n20709\ n20709\n7

This is when LstInvDtis dtype int64.

这是LstInvDtdtype int64 的时候。

Answer 1

采纳答案by EdChum

to_datetimeaccepts a format string:

to_datetime接受格式字符串：

In [92]:

t = 20070530
pd.to_datetime(str(t), format='%Y%m%d')
Out[92]:
Timestamp('2007-05-30 00:00:00')

example:

例子：

In [94]:

t = 20070530
df = pd.DataFrame({'date':[t]*10})
df
Out[94]:
       date
0  20070530
1  20070530
2  20070530
3  20070530
4  20070530
5  20070530
6  20070530
7  20070530
8  20070530
9  20070530
In [98]:

df['DateTime'] = df['date'].apply(lambda x: pd.to_datetime(str(x), format='%Y%m%d'))
df
Out[98]:
       date   DateTime
0  20070530 2007-05-30
1  20070530 2007-05-30
2  20070530 2007-05-30
3  20070530 2007-05-30
4  20070530 2007-05-30
5  20070530 2007-05-30
6  20070530 2007-05-30
7  20070530 2007-05-30
8  20070530 2007-05-30
9  20070530 2007-05-30
In [99]:

df.dtypes
Out[99]:
date                 int64
DateTime    datetime64[ns]
dtype: object

EDIT

编辑

Actually it's quicker to convert the type to string and then convert the entire series to a datetime rather than calling apply on every value:

实际上，将类型转换为字符串然后将整个系列转换为日期时间比对每个值调用 apply 更快：

In [102]:

df['DateTime'] = pd.to_datetime(df['date'].astype(str), format='%Y%m%d')
df
Out[102]:
       date   DateTime
0  20070530 2007-05-30
1  20070530 2007-05-30
2  20070530 2007-05-30
3  20070530 2007-05-30
4  20070530 2007-05-30
5  20070530 2007-05-30
6  20070530 2007-05-30
7  20070530 2007-05-30
8  20070530 2007-05-30
9  20070530 2007-05-30

timings

时间

In [104]:

%timeit df['date'].apply(lambda x: pd.to_datetime(str(x), format='%Y%m%d'))

100 loops, best of 3: 2.55 ms per loop
In [105]:

%timeit pd.to_datetime(df['date'].astype(str), format='%Y%m%d')
1000 loops, best of 3: 396 μs per loop

Python熊猫整数YYYYMMDD到日期时间

提问by Rookie

采纳答案by EdChum

相关推荐

最近更新

标签

Python熊猫整数YYYYMMDD到日期时间

提问by Rookie

采纳答案by EdChum

相关推荐

在python脚本中使用youtube-dl仅从youtube视频下载音频

更改python子图中的字体大小

Python 将箭头放在 matplotlib 的 3d 图中的向量上

Python 使用熊猫比较两列

相关推荐

最近更新

标签