ValueError:未转换的数据保留在 Pandas DataFrame 上
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 
原文地址: http://stackoverflow.com/questions/22491298/
Warning: these are provided under cc-by-sa 4.0 license.  You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
ValueError: unconverted data remains on Pandas DataFrame
提问by Will
Data- Here's my data in a Pandas DataFrame
数据- 这是我在 Pandas DataFrame 中的数据
CallDateAndTimeStart
01/01/2010 00:26:28.003613 MST
01/01/2010 00:28:54.230713 MST
01/02/2008 14:12:11 MST
05/19/2010 09:12:32.080728 MST
My attempt to change column dtype to datetime64[ns]
我尝试将列 dtype 更改为 datetime64[ns]
df['CallDateAndTimeStart'] = pandas.to_datetime(df['CallDateAndTimeStart'],
    format='%m/%d/%Y %H:%M:%S')
Error Message- Without cleaning the data, I get the following error:
错误消息- 没有清理数据,我收到以下错误:
File "C:\Python27\lib\site-packages\pandas\tseries\tools.py", line 308, in _convert_listlike raise e
ValueError: unconverted data remains: .003613 MST
Question
题
How would I correct my dataframe column so that it can convert to a datetime type? I posted my answer, but is there a better answer? Thanks.
我将如何更正我的数据框列,以便它可以转换为日期时间类型?我发布了我的答案,但有更好的答案吗?谢谢。
采纳答案by Will
Code
代码
I apply a custom function on the DataFrame column (convert_time)
我在 DataFrame 列上应用了一个自定义函数 (convert_time)
df['CallDateAndTimeStart'] = df['CallDateAndTimeStart'].apply(convert_time)
def convert_time(mytime):
""" Fix DateTime by removing details after . and timezones """
    # Remove on period and after
    try:
        mytime = str(mytime).split(".")[0]
    except ValueError:
        print "Not able to split ."
    # Remove Timeframe (E.g. MST)
    mytime = str(mytime).split(" ")[0] + " " + str(mytime).split(" ")[1]
return mytime
df['CallDateAndTimeStart'] = pandas.to_datetime(df['CallDateAndTimeStart'],
    format='%m/%d/%Y %H:%M:%S')
Output
输出
CallDateAndTimeStart
2010-01-01 00:26:28
2010-01-01 00:28:54
2010-05-19 09:12:32
2008-01-02 14:12:11
2010-01-01 00:39:41
回答by Rogim
I encounterd the same question and I used a same way like you to solve it.(apply a function to remove the unnecessary data)
我遇到了同样的问题,我用和你一样的方法解决了它。(应用一个函数来删除不必要的数据)
I guess you could use the standard interface to avoid this issue:
我想你可以使用标准接口来避免这个问题:
>>> now = time.time() # get current time in second
>>> now_format = time.ctime(now) # get formatted time, like 'Thu May 21 17:43:46 2015'
and then use time.strptime() to get a standard time struct:
然后使用 time.strptime() 获取标准时间结构:
>>> standard_time_struct = time.strptime(now_format,"%a %B %d %X %Y")
you can get the final result like this:
你可以得到这样的最终结果:
>>> standard_time_struct
>>> time.struct_time(tm_year=2015, tm_mon=5, tm_mday=21, tm_hour=17, tm_min=49, tm_sec=10, tm_wday=3, tm_yday=141, tm_isdst=-1)
回答by Morit
You received an error since you didn't define the format of the microseconds and the timezone.
您收到一个错误,因为您没有定义微秒和时区的格式。
If all the rows were in the same format, the correct format will be:
如果所有行的格式相同,则正确的格式为:
df['CallDateAndTimeStart'] = pandas.to_datetime(df['CallDateAndTimeStart'],
format='%m/%d/%Y %H:%M:%S.%f %Z') 
Since not all the rows are in the same format, the best way is to let pandas infer the format without declaring it:
由于并非所有行的格式都相同,因此最好的方法是让 Pandas 推断格式而不声明它:
df['CallDateAndTimeStart'] = pandas.to_datetime(df['CallDateAndTimeStart'])
The output:
输出:
        CallDateAndTimeStart
0 2010-01-01 00:26:28.003613
1 2010-01-01 00:28:54.230713
2 2008-01-02 14:12:11.000000
3 2010-05-19 09:12:32.080728
Notice that in this solution the time zone is being ignored since MST is not recognized but you can convert the datetime object to the correct time zone with tz_convert. Also, if you are not intrested in the microseconds, you can easily round it once it is a datetime object:
请注意,在此解决方案中,时区被忽略,因为 MST 无法识别,但您可以使用 tz_convert 将日期时间对象转换为正确的时区。此外,如果您对微秒不感兴趣,一旦它是日期时间对象,您就可以轻松地将其舍入:
df['CallDateAndTimeStartRounded'] = df['CallDateAndTimeStart'].dt.floor('s')
The output:
输出:
        CallDateAndTimeStart CallDateAndTimeStartRounded
0 2010-01-01 00:26:28.003613         2010-01-01 00:26:28
1 2010-01-01 00:28:54.230713         2010-01-01 00:28:54
2 2008-01-02 14:12:11.000000         2008-01-02 14:12:11
3 2010-05-19 09:12:32.080728         2010-05-19 09:12:32

