如何使用空值将字符串转换为日期时间 - python,pandas?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/29298577/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to convert string to datetime with nulls - python, pandas?
提问by Colin O'Brien
I have a series with some datetimes (as strings) and some nulls as 'nan':
我有一些日期时间(作为字符串)和一些空值作为“nan”的系列:
import pandas as pd, numpy as np, datetime as dt
df = pd.DataFrame({'Date':['2014-10-20 10:44:31', '2014-10-23 09:33:46', 'nan', '2014-10-01 09:38:45']})
I'm trying to convert these to datetime:
我正在尝试将这些转换为日期时间:
df['Date'] = df['Date'].apply(lambda x: dt.datetime.strptime(x, '%Y-%m-%d %H:%M:%S'))
but I get the error:
但我收到错误:
time data 'nan' does not match format '%Y-%m-%d %H:%M:%S'
So I try to turn these into actual nulls:
所以我试着把这些变成实际的空值:
df.ix[df['Date'] == 'nan', 'Date'] = np.NaN
and repeat:
并重复:
df['Date'] = df['Date'].apply(lambda x: dt.datetime.strptime(x, '%Y-%m-%d %H:%M:%S'))
but then I get the error:
但后来我得到了错误:
must be string, not float
必须是字符串,而不是浮点数
What is the quickest way to solve this problem?
解决这个问题的最快方法是什么?
采纳答案by EdChum
Just use to_datetime
and set errors='coerce'
to handle duff data:
只需使用to_datetime
并设置errors='coerce'
来处理 duff 数据:
In [321]:
df['Date'] = pd.to_datetime(df['Date'], errors='coerce')
df
Out[321]:
Date
0 2014-10-20 10:44:31
1 2014-10-23 09:33:46
2 NaT
3 2014-10-01 09:38:45
In [322]:
df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 4 entries, 0 to 3
Data columns (total 1 columns):
Date 3 non-null datetime64[ns]
dtypes: datetime64[ns](1)
memory usage: 64.0 bytes
the problem with calling strptime
is that it will raise an error if the string, or dtype is incorrect.
调用的问题strptime
在于,如果字符串或 dtype 不正确,它将引发错误。
If you did this then it would work:
如果你这样做,那么它会起作用:
In [324]:
def func(x):
try:
return dt.datetime.strptime(x, '%Y-%m-%d %H:%M:%S')
except:
return pd.NaT
df['Date'].apply(func)
Out[324]:
0 2014-10-20 10:44:31
1 2014-10-23 09:33:46
2 NaT
3 2014-10-01 09:38:45
Name: Date, dtype: datetime64[ns]
but it will be faster to use the inbuilt to_datetime
rather than call apply
which essentially just loops over your series.
但是使用内置to_datetime
而不是调用会更快,调用apply
基本上只是在您的系列上循环。
timings
时间
In [326]:
%timeit pd.to_datetime(df['Date'], errors='coerce')
%timeit df['Date'].apply(func)
10000 loops, best of 3: 65.8 μs per loop
10000 loops, best of 3: 186 μs per loop
We see here that using to_datetime
is 3X faster.
我们在这里看到使用to_datetime
速度提高了 3 倍。
回答by jdmarino
I find letting pandas do the work to be too slow on large dataframes. In another post I learned of a technique that speeds this up dramatically when the number of unique values is much smaller than the number of rows. (My data is usually stock price or trade blotter data.) It first builds a dict that maps the text dates to their datetime objects, then applies the dict to convert the column of text dates.
我发现让 Pandas 在大型数据帧上完成这项工作太慢了。在另一篇文章中,我了解到一种技术,当唯一值的数量远小于行数时,该技术可以显着加快速度。(我的数据通常是股票价格或交易记录数据。)它首先构建一个 dict 将文本日期映射到它们的日期时间对象,然后应用 dict 来转换文本日期列。
def str2time(val):
try:
return dt.datetime.strptime(val, '%H:%M:%S.%f')
except:
return pd.NaT
def TextTime2Time(s):
times = {t : str2time(t) for t in s.unique()}
return s.apply(lambda v: times[v])
df.date = TextTime2Time(df.date)