pandas 将日期从excel文件转换为pandas

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/43023226/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 03:16:56  来源:igfitidea点击:

Convert date from excel file to pandas

excelpandasdatetime

提问by Arnold Klein

I'm importing excel file, where the 'Date' column has different ways of writing:

我正在导入 excel 文件,其中“日期”列有不同的书写方式:

      Date
13/03/2017
13/03/2017
13/03/2017
13/03/2017
   10/3/17
   10/3/17
    9/3/17
    9/3/17
    9/3/17
    9/3/17

Importing to pandas:

导入Pandas:

df = pd.read_excel('data_excel.xls')
df.Date = pd.to_datetime(df.Date)

results in:

结果是:

                     Date
               13/03/2017
64             13/03/2017
65             13/03/2017
66             13/03/2017
67    2017-10-03 00:00:00
68    2017-10-03 00:00:00
69    2017-09-03 00:00:00
70    2017-09-03 00:00:00
71    2017-09-03 00:00:00
72    2017-09-03 00:00:00

Which means, pandas did not parse properly date and time:

这意味着,pandas 没有正确解析日期和时间:

10/3/17 -> 2017-10-03

when I tried to specify the format:

当我尝试指定格式时:

df.Date = pd.to_datetime(df.Date, format='%d%m%Y')

got the error:

得到错误:

ValueError: time data u'13/03/2017' does not match format '%d%m%Y' (match)

Question:

题:

How to import properly date and times from the excel file to pandas?

如何将日期和时间从 excel 文件正确导入到 Pandas?

回答by mechanical_meat

New answer:

新答案:

Actually pd.to_datetimehas a dayfirstkeyword argument that is useful here:

实际上pd.to_datetime有一个dayfirst关键字参数在这里很有用:

df.Date = pd.to_datetime(df.Date,dayfirst=True)

Result:

结果:

>>> df.Date
0   2017-03-13
1   2017-03-13
2   2017-03-13
3   2017-03-13
4   2017-03-10
5   2017-03-10
6   2017-03-09
7   2017-03-09
8   2017-03-09
9   2017-03-09
Name: Date, dtype: datetime64[ns]


Old answer:

旧答案:

Use the third-party module dateutilwhich can handle these kinds of variations. It has a dayfirstkeyword argument that is useful here:

使用dateutil可以处理这些变化的第三方模块。它有一个dayfirst关键字参数,在这里很有用:

import dateutil

df = pd.read_excel('data_excel.xls')
df.Date = df.Date.apply(lambda x: dateutil.parser.parse(x,dayfirst=True))

Result:

结果:

>>> df.Date
0   2017-03-13
1   2017-03-13
2   2017-03-13
3   2017-03-13
4   2017-03-10
5   2017-03-10
6   2017-03-09
7   2017-03-09
8   2017-03-09
9   2017-03-09
Name: Date, dtype: datetime64[ns]