Python pandas 通过 dt 访问器有效地将日期时间转换为时间戳
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/40881876/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python pandas convert datetime to timestamp effectively through dt accessor
提问by ragesz
I have a dataframe with some (hundreds of) million of rows. And I want to convert datetime to timestamp effectively. How can I do it?
我有一个包含(数亿)百万行的数据框。我想有效地将日期时间转换为时间戳。我该怎么做?
My sample df
:
我的样品df
:
df = pd.DataFrame(index=pd.DatetimeIndex(start=dt.datetime(2016,1,1,0,0,1),
end=dt.datetime(2016,1,2,0,0,1), freq='H'))\
.reset_index().rename(columns={'index':'datetime'})
df.head()
datetime
0 2016-01-01 00:00:01
1 2016-01-01 01:00:01
2 2016-01-01 02:00:01
3 2016-01-01 03:00:01
4 2016-01-01 04:00:01
Now I convert datetime to timestamp value-by-value with .apply()
but it takes a very long time (some hours) if I have some (hundreds of) million rows:
现在我将日期时间逐个值转换为时间戳,.apply()
但如果我有一些(数百)百万行,则需要很长时间(几个小时):
df['ts'] = df[['datetime']].apply(lambda x: x[0].timestamp(), axis=1).astype(int)
df.head()
datetime ts
0 2016-01-01 00:00:01 1451602801
1 2016-01-01 01:00:01 1451606401
2 2016-01-01 02:00:01 1451610001
3 2016-01-01 03:00:01 1451613601
4 2016-01-01 04:00:01 1451617201
The above result is what I want.
上面的结果就是我想要的。
If I try to use the .dt
accessor of pandas.Series
then I get error message:
如果我尝试使用.dt
访问器,pandas.Series
那么我会收到错误消息:
df['ts'] = df['datetime'].dt.timestamp
AttributeError: 'DatetimeProperties' object has no attribute 'timestamp'
AttributeError: 'DatetimeProperties' 对象没有属性 'timestamp'
If I try to create eg. the date parts of datetimes with the .dt
accessor then it is much more faster then using .apply()
:
如果我尝试创建例如。带有.dt
访问器的日期时间的日期部分然后它比使用要快得多.apply()
:
df['date'] = df['datetime'].dt.date
df.head()
datetime ts date
0 2016-01-01 00:00:01 1451602801 2016-01-01
1 2016-01-01 01:00:01 1451606401 2016-01-01
2 2016-01-01 02:00:01 1451610001 2016-01-01
3 2016-01-01 03:00:01 1451613601 2016-01-01
4 2016-01-01 04:00:01 1451617201 2016-01-01
I want something similar with timestamps...
我想要类似时间戳的东西......
But I don't really understand the official documentation: it talks about "Converting to Timestamps" but I don't see any timestamps there; it just talks about converting to datetime with pd.to_datetime()
but not to timestamp...
但我不太了解官方文档:它谈到“转换为时间戳”,但我在那里看不到任何时间戳;它只是谈论转换为日期pd.to_datetime()
时间而不是时间戳...
pandas.Timestamp
constructor also doesn't work (returns with the below error):
pandas.Timestamp
构造函数也不起作用(返回以下错误):
df['ts2'] = pd.Timestamp(df['datetime'])
TypeError: Cannot convert input to Timestamp
类型错误:无法将输入转换为时间戳
pandas.Series.to_timestamp
also makes something totally different that I want:
pandas.Series.to_timestamp
也使我想要的完全不同:
df['ts3'] = df['datetime'].to_timestamp
df.head()
datetime ts ts3
0 2016-01-01 00:00:01 1451602801 <bound method Series.to_timestamp of 0 2016...
1 2016-01-01 01:00:01 1451606401 <bound method Series.to_timestamp of 0 2016...
2 2016-01-01 02:00:01 1451610001 <bound method Series.to_timestamp of 0 2016...
3 2016-01-01 03:00:01 1451613601 <bound method Series.to_timestamp of 0 2016...
4 2016-01-01 04:00:01 1451617201 <bound method Series.to_timestamp of 0 2016...
Thank you!!
谢谢!!
回答by jezrael
I think you need convert first to numpy array
by values
and cast to int64
- output is in ns
, so need divide by 10 ** 9
:
我认为您需要先转换为numpy array
byvalues
并强制转换为int64
- 输出是 in ns
,因此需要除以10 ** 9
:
df['ts'] = df.datetime.values.astype(np.int64) // 10 ** 9
print (df)
datetime ts
0 2016-01-01 00:00:01 1451606401
1 2016-01-01 01:00:01 1451610001
2 2016-01-01 02:00:01 1451613601
3 2016-01-01 03:00:01 1451617201
4 2016-01-01 04:00:01 1451620801
5 2016-01-01 05:00:01 1451624401
6 2016-01-01 06:00:01 1451628001
7 2016-01-01 07:00:01 1451631601
8 2016-01-01 08:00:01 1451635201
9 2016-01-01 09:00:01 1451638801
10 2016-01-01 10:00:01 1451642401
11 2016-01-01 11:00:01 1451646001
12 2016-01-01 12:00:01 1451649601
13 2016-01-01 13:00:01 1451653201
14 2016-01-01 14:00:01 1451656801
15 2016-01-01 15:00:01 1451660401
16 2016-01-01 16:00:01 1451664001
17 2016-01-01 17:00:01 1451667601
18 2016-01-01 18:00:01 1451671201
19 2016-01-01 19:00:01 1451674801
20 2016-01-01 20:00:01 1451678401
21 2016-01-01 21:00:01 1451682001
22 2016-01-01 22:00:01 1451685601
23 2016-01-01 23:00:01 1451689201
24 2016-01-02 00:00:01 1451692801
to_timestamp
is used for converting from period to datetime index.
回答by Mithril
I think you should not use apply,
simply astype
would be fine:
我认为你不应该使用申请,astype
就可以了:
df['ts'] = df.datetime.astype('int64') // 10**9
回答by BCR
There's also another method to do this using the "hidden" attribute of DatetimeIndex
called asi8
, which creates an integer timestamp.
还有另一种方法可以使用DatetimeIndex
called的“隐藏”属性来做到这一点asi8
,它创建一个整数时间戳。
pd.DatetimeIndex(df.datetime).asi8
pd.DatetimeIndex(df.datetime).asi8
Wes McKinney suggested it in this tangentially related stackoverflow question linked here
Wes McKinney 在链接here的这个切线相关的stackoverflow问题中提出了建议
回答by Jozef Cechovsky
If you don't want to use numpy you can use pure pandas conversions
如果您不想使用 numpy,则可以使用纯 Pandas 转换
df['ts'] = pd.to_timedelta(df['datetime'], unit='ns').dt.total_seconds().astype(int)