Python pandas 通过 dt 访问器有效地将日期时间转换为时间戳

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/40881876/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-20 00:08:50  来源:igfitidea点击:

Python pandas convert datetime to timestamp effectively through dt accessor

pythondatetimepandastimestamp

提问by ragesz

I have a dataframe with some (hundreds of) million of rows. And I want to convert datetime to timestamp effectively. How can I do it?

我有一个包含(数亿)百万行的数据框。我想有效地将​​日期时间转换为时间戳。我该怎么做?

My sample df:

我的样品df

df = pd.DataFrame(index=pd.DatetimeIndex(start=dt.datetime(2016,1,1,0,0,1),
    end=dt.datetime(2016,1,2,0,0,1), freq='H'))\
    .reset_index().rename(columns={'index':'datetime'})

df.head()

             datetime
0 2016-01-01 00:00:01
1 2016-01-01 01:00:01
2 2016-01-01 02:00:01
3 2016-01-01 03:00:01
4 2016-01-01 04:00:01

Now I convert datetime to timestamp value-by-value with .apply()but it takes a very long time (some hours) if I have some (hundreds of) million rows:

现在我将日期时间逐个值转换为时间戳,.apply()但如果我有一些(数百)百万行,则需要很长时间(几个小时):

df['ts'] = df[['datetime']].apply(lambda x: x[0].timestamp(), axis=1).astype(int)

df.head()

             datetime          ts
0 2016-01-01 00:00:01  1451602801
1 2016-01-01 01:00:01  1451606401
2 2016-01-01 02:00:01  1451610001
3 2016-01-01 03:00:01  1451613601
4 2016-01-01 04:00:01  1451617201

The above result is what I want.

上面的结果就是我想要的。

If I try to use the .dtaccessor of pandas.Seriesthen I get error message:

如果我尝试使用.dt访问器,pandas.Series那么我会收到错误消息:

df['ts'] = df['datetime'].dt.timestamp

AttributeError: 'DatetimeProperties' object has no attribute 'timestamp'

AttributeError: 'DatetimeProperties' 对象没有属性 'timestamp'

If I try to create eg. the date parts of datetimes with the .dtaccessor then it is much more faster then using .apply():

如果我尝试创建例如。带有.dt访问器的日期时间的日期部分然后它比使用要快得多.apply()

df['date'] = df['datetime'].dt.date

df.head()

             datetime          ts        date
0 2016-01-01 00:00:01  1451602801  2016-01-01
1 2016-01-01 01:00:01  1451606401  2016-01-01
2 2016-01-01 02:00:01  1451610001  2016-01-01
3 2016-01-01 03:00:01  1451613601  2016-01-01
4 2016-01-01 04:00:01  1451617201  2016-01-01

I want something similar with timestamps...

我想要类似时间戳的东西......

But I don't really understand the official documentation: it talks about "Converting to Timestamps" but I don't see any timestamps there; it just talks about converting to datetime with pd.to_datetime()but not to timestamp...

但我不太了解官方文档:它谈到“转换为时间戳”,但我在那里看不到任何时间戳;它只是谈论转换为日期pd.to_datetime()时间而不是时间戳...

pandas.Timestampconstructor also doesn't work (returns with the below error):

pandas.Timestamp构造函数也不起作用(返回以下错误):

df['ts2'] = pd.Timestamp(df['datetime'])

TypeError: Cannot convert input to Timestamp

类型错误:无法将输入转换为时间戳

pandas.Series.to_timestampalso makes something totally different that I want:

pandas.Series.to_timestamp也使我想要的完全不同:

df['ts3'] = df['datetime'].to_timestamp

df.head()

             datetime          ts                                                ts3
0 2016-01-01 00:00:01  1451602801  <bound method Series.to_timestamp of 0    2016...
1 2016-01-01 01:00:01  1451606401  <bound method Series.to_timestamp of 0    2016...
2 2016-01-01 02:00:01  1451610001  <bound method Series.to_timestamp of 0    2016...
3 2016-01-01 03:00:01  1451613601  <bound method Series.to_timestamp of 0    2016...
4 2016-01-01 04:00:01  1451617201  <bound method Series.to_timestamp of 0    2016...

Thank you!!

谢谢!!

回答by jezrael

I think you need convert first to numpy arrayby valuesand cast to int64- output is in ns, so need divide by 10 ** 9:

我认为您需要先转换为numpy arraybyvalues并强制转换为int64- 输出是 in ns,因此需要除以10 ** 9

df['ts'] = df.datetime.values.astype(np.int64) // 10 ** 9
print (df)
              datetime          ts
0  2016-01-01 00:00:01  1451606401
1  2016-01-01 01:00:01  1451610001
2  2016-01-01 02:00:01  1451613601
3  2016-01-01 03:00:01  1451617201
4  2016-01-01 04:00:01  1451620801
5  2016-01-01 05:00:01  1451624401
6  2016-01-01 06:00:01  1451628001
7  2016-01-01 07:00:01  1451631601
8  2016-01-01 08:00:01  1451635201
9  2016-01-01 09:00:01  1451638801
10 2016-01-01 10:00:01  1451642401
11 2016-01-01 11:00:01  1451646001
12 2016-01-01 12:00:01  1451649601
13 2016-01-01 13:00:01  1451653201
14 2016-01-01 14:00:01  1451656801
15 2016-01-01 15:00:01  1451660401
16 2016-01-01 16:00:01  1451664001
17 2016-01-01 17:00:01  1451667601
18 2016-01-01 18:00:01  1451671201
19 2016-01-01 19:00:01  1451674801
20 2016-01-01 20:00:01  1451678401
21 2016-01-01 21:00:01  1451682001
22 2016-01-01 22:00:01  1451685601
23 2016-01-01 23:00:01  1451689201
24 2016-01-02 00:00:01  1451692801

to_timestampis used for converting from period to datetime index.

to_timestamp用于从 period转换为 datetime index

回答by Mithril

I think you should not use apply, simply astypewould be fine:

我认为你不应该使用申请,astype就可以了:

df['ts'] = df.datetime.astype('int64') // 10**9

回答by BCR

There's also another method to do this using the "hidden" attribute of DatetimeIndexcalled asi8, which creates an integer timestamp.

还有另一种方法可以使用DatetimeIndexcalled的“隐藏”属性来做到这一点asi8,它创建一个整数时间戳。

pd.DatetimeIndex(df.datetime).asi8

pd.DatetimeIndex(df.datetime).asi8

Wes McKinney suggested it in this tangentially related stackoverflow question linked here

Wes McKinney 在链接here的这个切线相关的stackoverflow问题中提出了建议

回答by Jozef Cechovsky

If you don't want to use numpy you can use pure pandas conversions

如果您不想使用 numpy,则可以使用纯 Pandas 转换

df['ts'] = pd.to_timedelta(df['datetime'], unit='ns').dt.total_seconds().astype(int)