Python Pandas to_datetime ValueError:未知的字符串格式

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/34505579/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 15:05:45  来源:igfitidea点击:

Pandas to_datetime ValueError: Unknown string format

pythonpandas

提问by pheno

I have a column in my (pandas) dataframe:

我的(熊猫)数据框中有一列:

data['Start Date'].head()
type(data['Start Date'])
Output:
1/7/13
1/7/13
1/7/13
16/7/13
16/7/13
<class 'pandas.core.series.Series'>

When I convert it into a date format (as follows) I am getting the error ValueError: Unknown string format

当我将其转换为日期格式(如下)时,出现错误ValueError: Unknown string format

data['Start Date']= pd.to_datetime(data['Start Date'],dayfirst=True)
...
...
/Library/Python/2.7/site-packages/pandas/tseries/tools.pyc in _convert_listlike(arg, box, format, name)
    381                 return DatetimeIndex._simple_new(values, name=name, tz=tz)
    382             except (ValueError, TypeError):
--> 383                 raise e
    384 
    385     if arg is None:

ValueError: Unknown string format

What am I missing here?

我在这里缺少什么?

采纳答案by jezrael

I think the problem is in data - a problematic string exists. So you can try check length of the string in column Start Date:

我认为问题出在数据中 - 存在有问题的字符串。所以你可以尝试检查列中字符串的长度Start Date

import pandas as pd
import io

temp=u"""Start Date
1/7/13
1/7/1
1/7/13 12 17
16/7/13
16/7/13"""

data = pd.read_csv(io.StringIO(temp), sep=";", parse_dates=False)

#data['Start Date']= pd.to_datetime(data['Start Date'],dayfirst=True)
print data

     Start Date
0        1/7/13
1         1/7/1
2  1/7/13 12 17
3       16/7/13
4       16/7/13

#check, if length is more as 7
print data[data['Start Date'].str.len() > 7]

     Start Date
2  1/7/13 12 17

Or you can try to find these problematic row different way e.g. read only part of the datetime and check parsing datetime:

或者您可以尝试以不同的方式找到这些有问题的行,例如仅读取日期时间的一部分并检查解析日期时间:

#read first 3 rows
data= data.iloc[:3]

data['Start Date']= pd.to_datetime(data['Start Date'],dayfirst=True)

But this is only tips.

但这只是提示。

EDIT:

编辑:

Thanks joris for suggestion add parameter errors ='coerce'to to_datetime:

感谢 joris 的建议将参数添加errors ='coerce'to_datetime

temp=u"""Start Date
1/7/13
1/7/1
1/7/13 12 17
16/7/13
16/7/13 12 04"""

data = pd.read_csv(io.StringIO(temp), sep=";")
#add parameter errors coerce
data['Start Date']= pd.to_datetime(data['Start Date'], dayfirst=True, errors='coerce')
print data

  Start Date
0 2013-07-01
1 2001-07-01
2        NaT
3 2013-07-16
4        NaT

#index of data with null - NaT to variable idx
idx = data[data['Start Date'].isnull()].index
print idx

Int64Index([2, 4], dtype='int64')

#read csv again
data = pd.read_csv(io.StringIO(temp), sep=";")

#find problematic rows, where datetime is not parsed
print data.iloc[idx]

      Start Date
2   1/7/13 12 17
4  16/7/13 12 04