Python 将熊猫 DateTimeIndex 转换为 Unix 时间?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/15203623/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Convert pandas DateTimeIndex to Unix Time?
提问by Christian Geier
What is the idiomatic way of converting a pandas DateTimeIndex to (an iterable of) Unix Time? This is probably not the way to go:
将熊猫 DateTimeIndex 转换为(可迭代的)Unix 时间的惯用方法是什么?这可能不是要走的路:
[time.mktime(t.timetuple()) for t in my_data_frame.index.to_pydatetime()]
采纳答案by root
As DatetimeIndexis ndarrayunder the hood, you can do the conversion without a comprehension (much faster).
由于DatetimeIndex是ndarray引擎盖下,你可以做转换没有理解(要快得多)。
In [1]: import numpy as np
In [2]: import pandas as pd
In [3]: from datetime import datetime
In [4]: dates = [datetime(2012, 5, 1), datetime(2012, 5, 2), datetime(2012, 5, 3)]
...: index = pd.DatetimeIndex(dates)
...:
In [5]: index.astype(np.int64)
Out[5]: array([1335830400000000000, 1335916800000000000, 1336003200000000000],
dtype=int64)
In [6]: index.astype(np.int64) // 10**9
Out[6]: array([1335830400, 1335916800, 1336003200], dtype=int64)
%timeit [t.value // 10 ** 9 for t in index]
10000 loops, best of 3: 119 us per loop
%timeit index.astype(np.int64) // 10**9
100000 loops, best of 3: 18.4 us per loop
回答by Andy Hayden
Note: Timestamp is just unix time with nanoseconds (so divide it by 10**9):
注意:时间戳只是带有纳秒的 unix 时间(因此将其除以 10**9):
[t.value // 10 ** 9 for t in tsframe.index]
For example:
例如:
In [1]: t = pd.Timestamp('2000-02-11 00:00:00')
In [2]: t
Out[2]: <Timestamp: 2000-02-11 00:00:00>
In [3]: t.value
Out[3]: 950227200000000000L
In [4]: time.mktime(t.timetuple())
Out[4]: 950227200.0
As @root points out it's faster to extract the array of values directly:
正如@root 指出的那样,直接提取值数组会更快:
tsframe.index.astype(np.int64) // 10 ** 9
回答by Rani
A summary of other answers:
其他答案的总结:
df['<time_col>'].astype(np.int64) // 10**9
If you want to keep the milliseconds divide by 10**6instead
如果你想保持毫秒除以10**6代替
回答by Elias Hasle
Complementing the other answers: //10**9will do a flooring divide, which gives full past seconds rather than the nearest value in seconds. A simple way to get more reasonable rounding, if that is desired, is to add 5*10**8 - 1before doing the flooring divide.
补充其他答案://10**9将做一个地板除法,它给出完整的过去秒而不是最接近的值(以秒为单位)。如果需要,获得更合理四舍五入的一种简单方法是5*10**8 - 1在进行地板除法之前添加。
回答by thomas
To address the case of NaT, which above solutions will convert to large negative ints, in pandas>=0.24 a possible solution would be:
为了解决 NaT 的情况,上述解决方案将转换为大的负整数,在 pandas>=0.24 中,一个可能的解决方案是:
def datetime_to_epoch(ser):
"""Don't convert NaT to large negative values."""
if ser.hasnans:
res = ser.dropna().astype('int64').astype('Int64').reindex(index=ser.index)
else:
res = ser.astype('int64')
return res // 10**9
In the case of missing values this will return the nullable int type 'Int64' (ExtensionType pd.Int64Dtype):
在缺少值的情况下,这将返回可为空的 int 类型“Int64”(ExtensionType pd.Int64Dtype):
In [5]: dt = pd.to_datetime(pd.Series(["2019-08-21", "2018-07-28", np.nan]))
In [6]: datetime_to_epoch(dt)
Out[6]:
0 1566345600
1 1532736000
2 NaN
dtype: Int64
Otherwise a regular int64:
否则是常规的 int64:
In [7]: datetime_to_epoch(dt[:2])
Out[7]:
0 1566345600
1 1532736000
dtype: int64
回答by Nour
If you have tried this on the datetime column of your dataframe:
如果您在数据框的日期时间列上尝试过此操作:
dframe['datetime'].astype(np.int64) // 10**9
& that you are struggling with the following error:TypeError: int() argument must be a string, a bytes-like object or a number, not 'Timestamp'you can just use these two lines :
&您正在为以下错误而苦苦挣扎:TypeError: int() argument must be a string, a bytes-like object or a number, not 'Timestamp'您可以只使用这两行:
dframe.index = pd.DatetimeIndex(dframe['datetime'])
dframe['datetime']= dframe.index.astype(np.int64)// 10**9

