Python 在 Pandas 中,如何将一串日期字符串转换为日期时间对象并将它们放入 DataFrame 中?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/17690738/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
In Pandas how do I convert a string of date strings to datetime objects and put them in a DataFrame?
提问by Dick Eshelman
import pandas as pd
date_stngs = ('2008-12-20','2008-12-21','2008-12-22','2008-12-23')
a = pd.Series(range(4),index = (range(4)))
for idx, date in enumerate(date_stngs):
a[idx]= pd.to_datetime(date)
This code bit produces error:
此代码位产生错误:
TypeError:" 'int' object is not iterable"
类型错误:“'int' 对象不可迭代”
Can anyone tell me how to get this series of date time strings into a DataFrame as DateTime
objects?
谁能告诉我如何将这一系列日期时间字符串作为DateTime
对象放入 DataFrame 中?
采纳答案by falsetru
>>> import pandas as pd
>>> date_stngs = ('2008-12-20','2008-12-21','2008-12-22','2008-12-23')
>>> a = pd.Series([pd.to_datetime(date) for date in date_stngs])
>>> a
0 2008-12-20 00:00:00
1 2008-12-21 00:00:00
2 2008-12-22 00:00:00
3 2008-12-23 00:00:00
UPDATE
更新
Use pandas.to_datetime(pd.Series(..)). It's concise and much faster than above code.
使用 pandas.to_datetime(pd.Series(..))。它比上面的代码简洁且快得多。
>>> pd.to_datetime(pd.Series(date_stngs))
0 2008-12-20 00:00:00
1 2008-12-21 00:00:00
2 2008-12-22 00:00:00
3 2008-12-23 00:00:00
回答by waitingkuo
In [46]: pd.to_datetime(pd.Series(date_stngs))
Out[46]:
0 2008-12-20 00:00:00
1 2008-12-21 00:00:00
2 2008-12-22 00:00:00
3 2008-12-23 00:00:00
dtype: datetime64[ns]
Update: benchmark
更新:基准
In [43]: dates = [(dt.datetime(1960, 1, 1)+dt.timedelta(days=i)).date().isoformat() for i in range(20000)]
In [44]: timeit pd.Series([pd.to_datetime(date) for date in dates])
1 loops, best of 3: 1.71 s per loop
In [45]: timeit pd.to_datetime(pd.Series(dates))
100 loops, best of 3: 5.71 ms per loop
回答by Ted Petrou
A simple solution involves the Series constructor. You can simply pass the data type to the dtype
parameter. Also, the to_datetime
function can take a sequence of strings now.
一个简单的解决方案涉及 Series 构造函数。您可以简单地将数据类型传递给dtype
参数。此外,该to_datetime
函数现在可以采用一系列字符串。
Create Data
创建数据
date_strings = ('2008-12-20','2008-12-21','2008-12-22','2008-12-23')
All three produce the same thing
所有三个产生相同的东西
pd.Series(date_strings, dtype='datetime64[ns]')
pd.Series(pd.to_datetime(date_strings))
pd.to_datetime(pd.Series(date_strings))
Benchmarks
基准
The benchmarks provided by @waitingkuo are wrong. The first method is a bit slower than the other two, which have the same performance.
@waitingkuo 提供的基准是错误的。第一种方法比其他两种方法慢一点,它们具有相同的性能。
import datetime as dt
dates = [(dt.datetime(1960, 1, 1)+dt.timedelta(days=i)).date().isoformat()
for i in range(20000)] * 100
%timeit pd.Series(dates, dtype='datetime64[ns]')
730 ms ± 9.06 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit pd.Series(pd.to_datetime(dates))
426 ms ± 3.45 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit pd.to_datetime(pd.Series(dates))
430 ms ± 5.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)