将一列日期时间转换为 Python 中的纪元
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/35630098/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Convert a column of datetimes to epoch in Python
提问by marcsarfa
I'm currently having an issue with Python. I have a Pandas DataFrame and one of the columns is a string with a date. The format is :
我目前在使用 Python 时遇到问题。我有一个 Pandas DataFrame,其中一列是带有日期的字符串。格式是:
"%Y-%m-%d %H:%m:00.000". For example : "2011-04-24 01:30:00.000"
“%Y-%m-%d %H:%m:00.000”。例如:“2011-04-24 01:30:00.000”
I need to convert the entire column to integers. I tried to run this code, but it is extremely slow and I have a few million rows.
我需要将整个列转换为整数。我试图运行这段代码,但它非常慢,而且我有几百万行。
for i in range(calls.shape[0]):
calls['dateint'][i] = int(time.mktime(time.strptime(calls.DATE[i], "%Y-%m-%d %H:%M:00.000")))
Do you guys know how to convert the whole column to epoch time ?
你们知道如何将整列转换为纪元时间吗?
Thanks in advance !
提前致谢 !
采纳答案by EdChum
convert the string to a datetime
using to_datetime
and then subtract datetime 1970-1-1 and call dt.total_seconds()
:
将字符串转换为datetime
usingto_datetime
然后减去日期时间 1970-1-1 并调用dt.total_seconds()
:
In [2]:
import pandas as pd
import datetime as dt
df = pd.DataFrame({'date':['2011-04-24 01:30:00.000']})
df
Out[2]:
date
0 2011-04-24 01:30:00.000
In [3]:
df['date'] = pd.to_datetime(df['date'])
df
Out[3]:
date
0 2011-04-24 01:30:00
In [6]:
(df['date'] - dt.datetime(1970,1,1)).dt.total_seconds()
Out[6]:
0 1303608600
Name: date, dtype: float64
You can see that converting this value back yields the same time:
您可以看到,将此值转换回相同的时间:
In [8]:
pd.to_datetime(1303608600, unit='s')
Out[8]:
Timestamp('2011-04-24 01:30:00')
So you can either add a new column or overwrite:
因此,您可以添加新列或覆盖:
In [9]:
df['epoch'] = (df['date'] - dt.datetime(1970,1,1)).dt.total_seconds()
df
Out[9]:
date epoch
0 2011-04-24 01:30:00 1303608600
EDIT
编辑
better method as suggested by @Jeff:
@Jeff 建议的更好方法:
In [3]:
df['date'].astype('int64')//1e9
Out[3]:
0 1303608600
Name: date, dtype: float64
In [4]:
%timeit (df['date'] - dt.datetime(1970,1,1)).dt.total_seconds()
%timeit df['date'].astype('int64')//1e9
100 loops, best of 3: 1.72 ms per loop
1000 loops, best of 3: 275 μs per loop
You can also see that it is significantly faster
您还可以看到它明显更快
回答by ares
From the Pandas documentationon working with time series data:
来自关于处理时间序列数据的Pandas 文档:
We subtract the epoch (midnight at January 1, 1970 UTC) and then floor divide by the “unit” (1 ms).
我们减去纪元(UTC 时间 1970 年 1 月 1 日午夜),然后除以“单位”(1 毫秒)。
# generate some timestamps
stamps = pd.date_range('2012-10-08 18:15:05', periods=4, freq='D')
# convert it to milliseconds from epoch
(stamps - pd.Timestamp("1970-01-01")) // pd.Timedelta('1ms')
This will give the epoch time in milliseconds.
这将以毫秒为单位给出纪元时间。