pandas 避免熊猫中 pd.to_datetime 的错误

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/36692861/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 01:03:56  来源:igfitidea点击:

avoiding error from pd.to_datetime in pandas

pythondatetimepandasdataframe

提问by Satya

I have a huge dataframe more than 100 mln rows. In that I have a date columns, unfortunately have improper formatted (mixed) date strings.

我有一个超过 1 亿行的巨大数据框。因为我有一个日期列,不幸的是格式不正确(混合)日期字符串。

Now I did convert it to datetime by:

现在我确实通过以下方式将其转换为日期时间:

df['TRX_DATE'] = pd.to_datetime(df['TRX_DATE'],coerce=True)
# without any error
# Now i want to calculate week day from that date columns
df['day_type'] = [x.strftime('%A') for x in d['TRX_DATE']]
###ValueError: month out of range

If it would a single field I can manage with dateutil parser. But in this case I am getting out of idea, how to handle that.

如果是单个字段,我可以使用 dateutil 解析器进行管理。但在这种情况下,我不知道如何处理。

Just intersted, if the week conversion line can have something like if anything getting out of range place a default...

只是感兴趣,如果周转换线可以有类似的东西,如果任何超出范围的地方设置默认值...

Have the idea but as a newbie. Don't have that much experience to do that.

有想法,但作为一个新手。没有那么多经验去做那件事。

It would be great help if someone can give a code line to handle that.

如果有人可以提供代码行来处理它,那将是非常有帮助的。

回答by jezrael

I think you can parse to_datetimewith parameter errors='coerce'and then use strftimefor converting to weekday as locale's full name:

我认为您可以to_datetime使用参数进行解析,errors='coerce'然后strftime用于转换为工作日作为语言环境的全名

print df
              TRX_DATE  some value
0  2010-08-15 13:00:00      27.065
1  2010-08-16 13:10:00      25.610
2  2010-08-17 02:30:00      17.000
3  2010-06-18 02:40:00      17.015
4  2010-18-19 02:50:00      16.910

df['TRX_DATE'] = pd.to_datetime(df['TRX_DATE'],errors='coerce')

df['day_type'] = df['TRX_DATE'].dt.strftime('%A')
print df
             TRX_DATE  some value day_type
0 2010-08-15 13:00:00      27.065   Sunday
1 2010-08-16 13:10:00      25.610   Monday
2 2010-08-17 02:30:00      17.000  Tuesday
3 2010-06-18 02:40:00      17.015   Friday
4                 NaT      16.910      NaT

回答by PhilChang

[x.strftime('%A') for x in df['TRX_DATE'] if not isinstance(x, pandas.tslib.NaTType)]