pandas python数据帧转换多种日期时间格式

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/31147500/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 23:33:19  来源:igfitidea点击:

python dataframe converting multiple datetime formats

pythondatetimepandasdataframe

提问by datadatadata

I have a pandas.dataframe like this ('col' column has two formats):

我有一个这样的pandas.dataframe('col'列有两种格式):

    col                            val
'12/1/2013'                       value1
'1/22/2014 12:00:01 AM'           value2
'12/10/2013'                      value3
'12/31/2013'                      value4 

I want to convert them into datetime, and I am considering using:

我想将它们转换为日期时间,我正在考虑使用:

test_df['col']= test_df['col'].map(lambda x: datetime.strptime(x, '%m/%d/%Y'))    
test_df['col']= test_df['col'].map(lambda x: datetime.strptime(x, '%m/%d/%Y %H:%M %p'))

Obviously either of them works for the whole df. I'm thinking about using try and except but didn't get any luck, any suggestions?

显然,它们中的任何一个都适用于整个 df。我正在考虑使用 try 和 except 但没有任何运气,有什么建议吗?

采纳答案by EdChum

Just use to_datetime, it's man/woman enough to handle both those formats:

只需使用to_datetime,就足以处理这两种格式的男人/女人:

In [4]:
df['col'] = pd.to_datetime(df['col'])
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 4 entries, 0 to 3
Data columns (total 2 columns):
col    4 non-null datetime64[ns]
val    4 non-null object
dtypes: datetime64[ns](1), object(1)
memory usage: 96.0+ bytes

The df now looks likes this:

df 现在看起来像这样:

In [5]:
df

Out[5]:
                  col     val
0 2013-12-01 00:00:00  value1
1 2014-01-22 00:00:01  value2
2 2013-12-10 00:00:00  value3
3 2013-12-31 00:00:00  value4

回答by morganics

I had two different date formats in the same column Temps, similar to the OP, which look like the following;

我在同一列中有两种不同的日期格式Temps,类似于 OP,如下所示;

01.03.2017 00:00:00.000
01/03/2017 00:13

The timings are as follows for the two different code snippets;

两个不同代码片段的时序如下;

v['Timestamp1'] = pd.to_datetime(v.Temps)

Took 25.5408718585968 seconds

耗时 25.5408718585968 秒

v['Timestamp'] = pd.to_datetime(v.Temps, format='%d/%m/%Y %H:%M', errors='coerce')
mask = v.Timestamp.isnull()
v.loc[mask, 'Timestamp'] = pd.to_datetime(v[mask]['Temps'], format='%d.%m.%Y %H:%M:%S.%f',
                                             errors='coerce')

Took 0.2923243045806885 seconds

花了 0.2923243045806885 秒

In other words, if you have a small number of known formats for your datetimes, don't use to_datetime without a format!

换句话说,如果您的日期时间有少量已知格式,请不要在没有格式的情况下使用 to_datetime!

回答by Alex

You can create a new column :

您可以创建一个新列:

test_df['col1'] = pd.Timestamp(test_df['col']).to_datetime()

and then drop col and rename col1.

然后删除 col 并重命名 col1。

回答by Joselin Ceron

It works for me. I had two formats in my column 'fecha_hechos'. The formats where:

这个对我有用。我的专栏“fecha_hechos”中有两种格式。其中的格式:

  • 2015/03/02
  • 10/02/2010
  • 2015/03/02
  • 10/02/2010

what I did was:

我所做的是:

carpetas_cdmx['Timestamp'] = pd.to_datetime(carpetas_cdmx.fecha_hechos, format='%Y/%m/%d %H:%M:%S', errors='coerce')
mask = carpetas_cdmx.Timestamp.isnull()
carpetas_cdmx.loc[mask, 'Timestamp'] = pd.to_datetime(carpetas_cdmx[mask]['fecha_hechos'], format='%d/%m/%Y %H:%M',errors='coerce')

were: carpetas_cdmxis my DataFrame and fecha_hechosthe column with my formats

是:carpetas_cdmx是我的 DataFrame 和fecha_hechos带有我的格式的列