Python - Pandas - 将 YYYYMM 转换为日期时间

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/45215525/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 04:03:12  来源:igfitidea点击:

Python - Pandas - Convert YYYYMM to datetime

pythonpandas

提问by Mtd240

Beginner python (and therefore pandas) user. I am trying to import some data into a pandas dataframe. One of the columns is the date, but in the format "YYYYMM". I have attempted to do what most forum responses suggest:

初学者python(以及pandas)用户。我正在尝试将一些数据导入到Pandas数据框中。其中一列是日期,但格式为“YYYYMM”。我试图做大多数论坛回复建议的事情:

df_cons['YYYYMM'] = pd.to_datetime(df_cons['YYYYMM'], format='%Y%m')

This doesn't work though (ValueError: unconverted data remains: 3). The column actually includes an additional value for each year, with MM=13. The source used this row as an average of the past year. I am guessing to_datetimeis having an issue with that.

虽然这不起作用 ( ValueError: unconverted data remains: 3)。该列实际上包括每年的附加值,MM=13。来源将此行用作过去一年的平均值。我猜to_datetime这有问题。

Could anyone offer a quick solution, either to strip out all of the annual averages (those with the last two digits "13"), or to have to_datetimeignore them?

谁能提供一个快速的解决方案,要么去掉所有的年平均值(最后两位数字为“13”的那些),要么to_datetime忽略它们?

采纳答案by EdChum

pass errors='coerce'and then dropnathe NaTrows:

通过errors='coerce'然后dropnaNaT行:

df_cons['YYYYMM'] = pd.to_datetime(df_cons['YYYYMM'], format='%Y%m', errors='coerce').dropna()

The duff month values will get converted to NaTvalues

duff 月份值将转换为NaT

In[36]:
pd.to_datetime('201613', format='%Y%m', errors='coerce')

Out[36]: NaT

Alternatively you could filter them out before the conversion

或者,您可以在转换之前将它们过滤掉

df_cons['YYYYMM'] = pd.to_datetime(df_cons.loc[df_cons['YYYYMM'].str[-2:] != '13','YYYYMM'], format='%Y%m', errors='coerce')

although this could lead to alignment issues as the returned Series needs to be the same length so just passing errors='coerce'is a simpler solution

尽管这可能会导致对齐问题,因为返回的 Series 需要具有相同的长度,因此仅通过errors='coerce'是一个更简单的解决方案

回答by frogcoder

Clean up the dataframe first.

首先清理数据框。

df_cons = df_cons[~df_cons['YYYYMM'].str.endswith('13')]
df_cons['YYYYMM'] = pd.to_datetime(df_cons['YYYYMM'])

May I suggest turning the column into a period index if YYYYMM column is unique in your dataset.

如果 YYYYMM 列在您的数据集中是唯一的,我可以建议将该列转换为周期索引。

First turn YYYYMM into index, then convert it to monthly period.

先把YYYYMM转成索引,再转成月度。

df_cons = df_cons.reset_index().set_index('YYYYMM').to_period('M')