如何使用空值将字符串转换为日期时间 - python,pandas?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/29298577/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 04:21:32  来源:igfitidea点击:

How to convert string to datetime with nulls - python, pandas?

pythonstringdatetimepandastype-conversion

提问by Colin O'Brien

I have a series with some datetimes (as strings) and some nulls as 'nan':

我有一些日期时间(作为字符串)和一些空值作为“nan”的系列:

import pandas as pd, numpy as np, datetime as dt
df = pd.DataFrame({'Date':['2014-10-20 10:44:31', '2014-10-23 09:33:46', 'nan', '2014-10-01 09:38:45']})

I'm trying to convert these to datetime:

我正在尝试将这些转换为日期时间:

df['Date'] = df['Date'].apply(lambda x: dt.datetime.strptime(x, '%Y-%m-%d %H:%M:%S'))

but I get the error:

但我收到错误:

time data 'nan' does not match format '%Y-%m-%d %H:%M:%S'

So I try to turn these into actual nulls:

所以我试着把这些变成实际的空值:

df.ix[df['Date'] == 'nan', 'Date'] = np.NaN

and repeat:

并重复:

df['Date'] = df['Date'].apply(lambda x: dt.datetime.strptime(x, '%Y-%m-%d %H:%M:%S'))

but then I get the error:

但后来我得到了错误:

must be string, not float

必须是字符串,而不是浮点数

What is the quickest way to solve this problem?

解决这个问题的最快方法是什么?

采纳答案by EdChum

Just use to_datetimeand set errors='coerce'to handle duff data:

只需使用to_datetime并设置errors='coerce'来处理 duff 数据:

In [321]:

df['Date'] = pd.to_datetime(df['Date'], errors='coerce')
df
Out[321]:
                 Date
0 2014-10-20 10:44:31
1 2014-10-23 09:33:46
2                 NaT
3 2014-10-01 09:38:45

In [322]:

df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 4 entries, 0 to 3
Data columns (total 1 columns):
Date    3 non-null datetime64[ns]
dtypes: datetime64[ns](1)
memory usage: 64.0 bytes

the problem with calling strptimeis that it will raise an error if the string, or dtype is incorrect.

调用的问题strptime在于,如果字符串或 dtype 不正确,它将引发错误。

If you did this then it would work:

如果你这样做,那么它会起作用:

In [324]:

def func(x):
    try:
        return dt.datetime.strptime(x, '%Y-%m-%d %H:%M:%S')
    except:
        return pd.NaT

df['Date'].apply(func)
Out[324]:
0   2014-10-20 10:44:31
1   2014-10-23 09:33:46
2                   NaT
3   2014-10-01 09:38:45
Name: Date, dtype: datetime64[ns]

but it will be faster to use the inbuilt to_datetimerather than call applywhich essentially just loops over your series.

但是使用内置to_datetime而不是调用会更快,调用apply基本上只是在您的系列上循环。

timings

时间

In [326]:

%timeit pd.to_datetime(df['Date'], errors='coerce')
%timeit df['Date'].apply(func)
10000 loops, best of 3: 65.8 μs per loop
10000 loops, best of 3: 186 μs per loop

We see here that using to_datetimeis 3X faster.

我们在这里看到使用to_datetime速度提高了 3 倍。

回答by jdmarino

I find letting pandas do the work to be too slow on large dataframes. In another post I learned of a technique that speeds this up dramatically when the number of unique values is much smaller than the number of rows. (My data is usually stock price or trade blotter data.) It first builds a dict that maps the text dates to their datetime objects, then applies the dict to convert the column of text dates.

我发现让 Pandas 在大型数据帧上完成这项工作太慢了。在另一篇文章中,我了解到一种技术,当唯一值的数量远小于行数时,该技术可以显着加快速度。(我的数据通常是股票价格或交易记录数据。)它首先构建一个 dict 将文本日期映射到它们的日期时间对象,然后应用 dict 来转换文本日期列。

def str2time(val):
    try:
        return dt.datetime.strptime(val, '%H:%M:%S.%f')
    except:
        return pd.NaT

def TextTime2Time(s):
    times = {t : str2time(t) for t in s.unique()}
    return s.apply(lambda v: times[v])

df.date = TextTime2Time(df.date)