将一列日期时间转换为 Python 中的纪元

Question

提问by marcsarfa

I'm currently having an issue with Python. I have a Pandas DataFrame and one of the columns is a string with a date. The format is :

我目前在使用 Python 时遇到问题。我有一个 Pandas DataFrame，其中一列是带有日期的字符串。格式是：

"%Y-%m-%d %H:%m:00.000". For example : "2011-04-24 01:30:00.000"

“%Y-%m-%d %H:%m:00.000”。例如：“2011-04-24 01:30:00.000”

I need to convert the entire column to integers. I tried to run this code, but it is extremely slow and I have a few million rows.

我需要将整个列转换为整数。我试图运行这段代码，但它非常慢，而且我有几百万行。

    for i in range(calls.shape[0]):
        calls['dateint'][i] = int(time.mktime(time.strptime(calls.DATE[i], "%Y-%m-%d %H:%M:00.000")))

Do you guys know how to convert the whole column to epoch time ?

你们知道如何将整列转换为纪元时间吗？

Thanks in advance !

提前致谢！

Answer 1

采纳答案by EdChum

convert the string to a datetimeusing to_datetimeand then subtract datetime 1970-1-1 and call dt.total_seconds():

将字符串转换为datetimeusingto_datetime然后减去日期时间 1970-1-1 并调用dt.total_seconds()：

In [2]:
import pandas as pd
import datetime as dt
df = pd.DataFrame({'date':['2011-04-24 01:30:00.000']})
df

Out[2]:
                      date
0  2011-04-24 01:30:00.000

In [3]:
df['date'] = pd.to_datetime(df['date'])
df

Out[3]:
                 date
0 2011-04-24 01:30:00

In [6]:    
(df['date'] - dt.datetime(1970,1,1)).dt.total_seconds()

Out[6]:
0    1303608600
Name: date, dtype: float64

You can see that converting this value back yields the same time:

您可以看到，将此值转换回相同的时间：

In [8]:
pd.to_datetime(1303608600, unit='s')

Out[8]:
Timestamp('2011-04-24 01:30:00')

So you can either add a new column or overwrite:

因此，您可以添加新列或覆盖：

In [9]:
df['epoch'] = (df['date'] - dt.datetime(1970,1,1)).dt.total_seconds()
df

Out[9]:
                 date       epoch
0 2011-04-24 01:30:00  1303608600

EDIT

编辑

better method as suggested by @Jeff:

@Jeff 建议的更好方法：

In [3]:
df['date'].astype('int64')//1e9

Out[3]:
0    1303608600
Name: date, dtype: float64

In [4]:
%timeit (df['date'] - dt.datetime(1970,1,1)).dt.total_seconds()
%timeit df['date'].astype('int64')//1e9

100 loops, best of 3: 1.72 ms per loop
1000 loops, best of 3: 275 μs per loop

You can also see that it is significantly faster

您还可以看到它明显更快

Answer 2

回答by ares

From the Pandas documentationon working with time series data:

来自关于处理时间序列数据的Pandas 文档：

We subtract the epoch (midnight at January 1, 1970 UTC) and then floor divide by the “unit” (1 ms).

我们减去纪元（UTC 时间 1970 年 1 月 1 日午夜），然后除以“单位”（1 毫秒）。

# generate some timestamps
stamps = pd.date_range('2012-10-08 18:15:05', periods=4, freq='D')

# convert it to milliseconds from epoch
(stamps - pd.Timestamp("1970-01-01")) // pd.Timedelta('1ms')

This will give the epoch time in milliseconds.

这将以毫秒为单位给出纪元时间。

将一列日期时间转换为 Python 中的纪元

提问by marcsarfa

采纳答案by EdChum

回答by ares

相关推荐

最近更新

标签

将一列日期时间转换为 Python 中的纪元

提问by marcsarfa

采纳答案by EdChum

回答by ares

相关推荐

Python 将时间戳列拆分为单独的日期和时间列

Python 如何使用pyarrow从S3读取拼花文件列表作为pandas数据框？

%matplotlib 行魔术导致 Python 脚本中的 SyntaxError

Python ValueError: 无法将字符串转换为浮点数：'.'

相关推荐

最近更新

标签