Python 使用pandas to_datetime时如何定义格式?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/36848514/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 18:26:28  来源:igfitidea点击:

How to define format when use pandas to_datetime?

pythonpandasdatetime

提问by ju.

I want to plot RESULT vs TIME based on a testresult.csvfile that has following format, and I have trouble to get the TIME column's datatype defined properly.

我想根据testresult.csv具有以下格式的文件绘制 RESULT 与 TIME 的关系图,但无法正确定义 TIME 列的数据类型。

TIME,RESULT  
03/24/2016 12:27:11 AM,2  
03/24/2016 12:28:41 AM,76  
03/24/2016 12:37:23 AM,19  
03/24/2016 12:38:44 AM,68  
03/24/2016 12:42:02 AM,44  
...

To read the csv file, this is the code I wrote: raw_df = pd.read_csv('testresult.csv', index_col=None, parse_dates=['TIME'], infer_datetime_format=True)
This code works, but it is extremely slow, and I assume that the infer_datetime_formattakes time. So I tried to read in the csv by default first, and then convert the object dtype 'TIME' to datetime dtype by using to_datetime(), and I hope by defining the format, it might expedite the speed.

要读取 csv 文件,这是我编写的代码: raw_df = pd.read_csv('testresult.csv', index_col=None, parse_dates=['TIME'], infer_datetime_format=True)
此代码有效,但速度非常慢,我认为这infer_datetime_format需要时间。所以我尝试先默认读取csv,然后使用 将对象dtype 'TIME' 转换为datetime dtype to_datetime(),我希望通过定义格式,可以加快速度。

raw_df =  pd.read_csv('testresult.csv')
raw_df.loc['NEWTIME'] = pd.to_datetiem(raw_df['TIME'], format='%m/%d%Y %-I%M%S %p')

This code complained error:

此代码抱怨错误:

"ValueError: '-' is a bad directive in format '%m/%d%Y %-I%M%S %p'"

"ValueError: '-' is a bad directive in format '%m/%d%Y %-I%M%S %p'"

Please any suggestion or hint will be helpful.

请任何建议或提示都会有所帮助。

Thanks

谢谢

回答by Andy

The format you are passing is invalid. The dash between the %and the Iis not supposed to be there.

您传递的格式无效。%和之间的破折号I不应该在那里。

df['TIME'] = pd.to_datetime(df['TIME'], format="%m/%d/%Y %I:%M:%S %p")

This will convert your TIMEcolumn to a datetime.

这会将您的TIME列转换为日期时间。



Alternatively, you can adjust your read_csvcall to do this:

或者,您可以调整您的read_csv通话来执行此操作:

pd.read_csv('testresult.csv', parse_dates=['TIME'], 
    date_parser=lambda x: pd.to_datetime(x, format='%m/%d/%Y %I:%M:%S %p'))

Again, this uses the appropriate format with out the extra -, but it also passes in the format to the date_parserparameter instead of having pandas attempt to guess it with the infer_datetime_formatparameter.

同样,这使用了不带 extra 的适当格式-,但它也会将格式传递给date_parser参数,而不是让 Pandas 尝试使用infer_datetime_format参数来猜测它。

回答by MaxU

you can try this:

你可以试试这个:

In [69]: df = pd.read_csv(fn, parse_dates=[0],
                          date_parser=lambda x: pd.to_datetime(x, format='%m/%d/%Y %I:%M:%S %p'))

In [70]: df
Out[70]:
                 TIME  RESULT
0 2016-03-24 00:27:11       2
1 2016-03-24 00:28:41      76
2 2016-03-24 00:37:23      19
3 2016-03-24 00:38:44      68
4 2016-03-24 00:42:02      44