Python 使用 pandas.to_datetime 转换时指定日期格式

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/16672237/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 23:19:52  来源:igfitidea点击:

Specifying date format when converting with pandas.to_datetime

pythondatetimepandas

提问by cms_mgr

I have data in a csv file with dates stored as strings in a standard UK format - %d/%m/%Y- meaning they look like:

我在 csv 文件中有数据,日期以标准英国格式存储为字符串%d/%m/%Y- 意味着它们看起来像:

12/01/2012
30/01/2012

The examples above represent 12 January 2012 and 30 January 2012.

上述示例分别代表 2012 年 1 月 12 日和 2012 年 1 月 30 日。

When I import this data with pandas version 0.11.0 I applied the following transformation:

当我使用 Pandas 0.11.0 版导入此数据时,我应用了以下转换:

import pandas as pd
...
cpts.Date = cpts.Date.apply(pd.to_datetime)

but it converted dates inconsistently. To use my existing example, 12/01/2012 would convert as a datetime object representing 1 December 2012 but 30/01/2012 converts as 30 January 2012, which is what I want.

但它转换日期不一致。使用我现有的示例,12/01/2012 将转换为表示 2012 年 12 月 1 日的日期时间对象,但 30/01/2012 转换为 2012 年 1 月 30 日,这正是我想要的。

After looking at this questionI tried:

看了这个问题后,我试过:

cpts.Date = cpts.Date.apply(pd.to_datetime, format='%d/%m/%Y')

but the results are exactly the same. The source codesuggests I'm doing things right so I'm at a loss. Does anyone know what I'm doing wrong?

但结果是完全一样的。该源代码表明我正在做正确的事情,所以我无所适从。有谁知道我做错了什么?

采纳答案by joris

You can use the parse_datesoption from read_csvto do the conversion directly while reading you data.
The trick here is to use dayfirst=Trueto indicate your dates start with the day and not with the month. See here for more information: http://pandas.pydata.org/pandas-docs/dev/generated/pandas.io.parsers.read_csv.html

您可以在读取数据时使用parse_datesfrom 选项read_csv直接进行转换。
这里的技巧是用来dayfirst=True指示您的日期以日期而不是月份开头。有关更多信息,请参见此处:http: //pandas.pydata.org/pandas-docs/dev/generated/pandas.io.parsers.read_csv.html

When your dates have to be the index:

当您的日期必须是索引时:

>>> import pandas as pd
>>> from StringIO import StringIO
>>> s = StringIO("""date,value
... 12/01/2012,1
... 12/01/2012,2
... 30/01/2012,3""")
>>> 
>>> pd.read_csv(s, index_col=0, parse_dates=True, dayfirst=True)
            value
date             
2012-01-12      1
2012-01-12      2
2012-01-30      3

Or when your dates are just in a certain column:

或者当您的日期仅在某一列中时:

>>> s = StringIO("""date
... 12/01/2012
... 12/01/2012
... 30/01/2012""")
>>> 
>>> pd.read_csv(s, parse_dates=[0], dayfirst=True)
                 date
0 2012-01-12 00:00:00
1 2012-01-12 00:00:00
2 2012-01-30 00:00:00

回答by Andy Hayden

I think you are calling it correctly, and I posted this as an issue on github.

我认为您正确地调用了它,并且我将此作为问题发布在 github 上

You can just specify the format to to_datetimedirectly, for example:

您可以直接指定格式to_datetime,例如:

In [1]: s = pd.Series(['12/1/2012', '30/01/2012'])

In [2]: pd.to_datetime(s, format='%d/%m/%Y')
Out[2]:
0   2012-01-12 00:00:00
1   2012-01-30 00:00:00
dtype: datetime64[ns]

Update: As OP correctly points out this doesn't work with NaN, if you are happy with dayfirst=True(which works with NaN too):

更新:正如 OP 正确指出这不适用于 NaN,如果您满意dayfirst=True(也适用于 NaN):

s.apply(pd.to_datetime, dayfirst=True)

Worth noting that have to be careful using dayfirst(which is easier than specifying the exact format), since dayfirstisn't strict.

值得注意的是必须小心使用dayfirst(这比指定确切格式更容易),因为dayfirst不是严格的