Python - Pandas - 将 YYYYMM 转换为日期时间
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/45215525/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python - Pandas - Convert YYYYMM to datetime
提问by Mtd240
Beginner python (and therefore pandas) user. I am trying to import some data into a pandas dataframe. One of the columns is the date, but in the format "YYYYMM". I have attempted to do what most forum responses suggest:
初学者python(以及pandas)用户。我正在尝试将一些数据导入到Pandas数据框中。其中一列是日期,但格式为“YYYYMM”。我试图做大多数论坛回复建议的事情:
df_cons['YYYYMM'] = pd.to_datetime(df_cons['YYYYMM'], format='%Y%m')
This doesn't work though (ValueError: unconverted data remains: 3
). The column actually includes an additional value for each year, with MM=13. The source used this row as an average of the past year. I am guessing to_datetime
is having an issue with that.
虽然这不起作用 ( ValueError: unconverted data remains: 3
)。该列实际上包括每年的附加值,MM=13。来源将此行用作过去一年的平均值。我猜to_datetime
这有问题。
Could anyone offer a quick solution, either to strip out all of the annual averages (those with the last two digits "13"), or to have to_datetime
ignore them?
谁能提供一个快速的解决方案,要么去掉所有的年平均值(最后两位数字为“13”的那些),要么to_datetime
忽略它们?
采纳答案by EdChum
pass errors='coerce'
and then dropna
the NaT
rows:
通过errors='coerce'
然后dropna
的NaT
行:
df_cons['YYYYMM'] = pd.to_datetime(df_cons['YYYYMM'], format='%Y%m', errors='coerce').dropna()
The duff month values will get converted to NaT
values
duff 月份值将转换为NaT
值
In[36]:
pd.to_datetime('201613', format='%Y%m', errors='coerce')
Out[36]: NaT
Alternatively you could filter them out before the conversion
或者,您可以在转换之前将它们过滤掉
df_cons['YYYYMM'] = pd.to_datetime(df_cons.loc[df_cons['YYYYMM'].str[-2:] != '13','YYYYMM'], format='%Y%m', errors='coerce')
although this could lead to alignment issues as the returned Series needs to be the same length so just passing errors='coerce'
is a simpler solution
尽管这可能会导致对齐问题,因为返回的 Series 需要具有相同的长度,因此仅通过errors='coerce'
是一个更简单的解决方案
回答by frogcoder
Clean up the dataframe first.
首先清理数据框。
df_cons = df_cons[~df_cons['YYYYMM'].str.endswith('13')]
df_cons['YYYYMM'] = pd.to_datetime(df_cons['YYYYMM'])
May I suggest turning the column into a period index if YYYYMM column is unique in your dataset.
如果 YYYYMM 列在您的数据集中是唯一的,我可以建议将该列转换为周期索引。
First turn YYYYMM into index, then convert it to monthly period.
先把YYYYMM转成索引,再转成月度。
df_cons = df_cons.reset_index().set_index('YYYYMM').to_period('M')