将一列日期时间转换为 Python 中的纪元

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/35630098/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 16:42:16  来源:igfitidea点击:

Convert a column of datetimes to epoch in Python

pythondatetimepandasepoch

提问by marcsarfa

I'm currently having an issue with Python. I have a Pandas DataFrame and one of the columns is a string with a date. The format is :

我目前在使用 Python 时遇到问题。我有一个 Pandas DataFrame,其中一列是带有日期的字符串。格式是:

"%Y-%m-%d %H:%m:00.000". For example : "2011-04-24 01:30:00.000"

“%Y-%m-%d %H:%m:00.000”。例如:“2011-04-24 01:30:00.000”

I need to convert the entire column to integers. I tried to run this code, but it is extremely slow and I have a few million rows.

我需要将整个列转换为整数。我试图运行这段代码,但它非常慢,而且我有几百万行。

    for i in range(calls.shape[0]):
        calls['dateint'][i] = int(time.mktime(time.strptime(calls.DATE[i], "%Y-%m-%d %H:%M:00.000")))

Do you guys know how to convert the whole column to epoch time ?

你们知道如何将整列转换为纪元时间吗?

Thanks in advance !

提前致谢 !

采纳答案by EdChum

convert the string to a datetimeusing to_datetimeand then subtract datetime 1970-1-1 and call dt.total_seconds():

将字符串转换为datetimeusingto_datetime然后减去日期时间 1970-1-1 并调用dt.total_seconds()

In [2]:
import pandas as pd
import datetime as dt
df = pd.DataFrame({'date':['2011-04-24 01:30:00.000']})
df

Out[2]:
                      date
0  2011-04-24 01:30:00.000

In [3]:
df['date'] = pd.to_datetime(df['date'])
df

Out[3]:
                 date
0 2011-04-24 01:30:00

In [6]:    
(df['date'] - dt.datetime(1970,1,1)).dt.total_seconds()

Out[6]:
0    1303608600
Name: date, dtype: float64

You can see that converting this value back yields the same time:

您可以看到,将此值转换回相同的时间:

In [8]:
pd.to_datetime(1303608600, unit='s')

Out[8]:
Timestamp('2011-04-24 01:30:00')

So you can either add a new column or overwrite:

因此,您可以添加新列或覆盖:

In [9]:
df['epoch'] = (df['date'] - dt.datetime(1970,1,1)).dt.total_seconds()
df

Out[9]:
                 date       epoch
0 2011-04-24 01:30:00  1303608600

EDIT

编辑

better method as suggested by @Jeff:

@Jeff 建议的更好方法:

In [3]:
df['date'].astype('int64')//1e9

Out[3]:
0    1303608600
Name: date, dtype: float64

In [4]:
%timeit (df['date'] - dt.datetime(1970,1,1)).dt.total_seconds()
%timeit df['date'].astype('int64')//1e9

100 loops, best of 3: 1.72 ms per loop
1000 loops, best of 3: 275 μs per loop

You can also see that it is significantly faster

您还可以看到它明显更快

回答by ares

From the Pandas documentationon working with time series data:

来自关于处理时间序列数据的Pandas 文档

We subtract the epoch (midnight at January 1, 1970 UTC) and then floor divide by the “unit” (1 ms).

我们减去纪元(UTC 时间 1970 年 1 月 1 日午夜),然后除以“单位”(1 毫秒)。

# generate some timestamps
stamps = pd.date_range('2012-10-08 18:15:05', periods=4, freq='D')

# convert it to milliseconds from epoch
(stamps - pd.Timestamp("1970-01-01")) // pd.Timedelta('1ms')

This will give the epoch time in milliseconds.

这将以毫秒为单位给出纪元时间。