to_datetime 值错误:至少必须指定 [年、月、日] Pandas
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/39992411/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
to_datetime Value Error: at least that [year, month, day] must be specified Pandas
提问by Jed
I am reading from two different CSVs each having date values in their columns. After read_csv I want to convert the data to datetime with the to_datetime method. The formats of the dates in each CSV are slightly different, and although the differences are noted and specified in the to_datetime format argument, the one converts fine, while the other returns the following value error.
我正在从两个不同的 CSV 中读取,每个 CSV 的列中都有日期值。在 read_csv 之后,我想使用 to_datetime 方法将数据转换为日期时间。每个 CSV 中的日期格式略有不同,虽然在 to_datetime 格式参数中指出并指定了差异,但一个转换正常,而另一个返回以下值错误。
ValueError: to assemble mappings requires at least that [year, month, day] be sp
ecified: [day,month,year] is missing
first dte.head()
第一个 dte.head()
0 10/14/2016 10/17/2016 10/19/2016 8/9/2016 10/17/2016 7/20/2016
1 7/15/2016 7/18/2016 7/20/2016 6/7/2016 7/18/2016 4/19/2016
2 4/15/2016 4/14/2016 4/18/2016 3/15/2016 4/18/2016 1/14/2016
3 1/15/2016 1/19/2016 1/19/2016 10/19/2015 1/19/2016 10/13/2015
4 10/15/2015 10/14/2015 10/19/2015 7/23/2015 10/14/2015 7/15/2015
this dataframe converts fine using the following code:
使用以下代码可以很好地转换此数据框:
dte = pd.to_datetime(dte, infer_datetime_format=True)
or
或者
dte = pd.to_datetime(dte[x], format='%m/%d/%Y')
the second dtd.head()
第二个 dtd.head()
0 2004-01-02 2004-01-02 2004-01-09 2004-01-16 2004-01-23 2004-01-30
1 2004-01-05 2004-01-09 2004-01-16 2004-01-23 2004-01-30 2004-02-06
2 2004-01-06 2004-01-09 2004-01-16 2004-01-23 2004-01-30 2004-02-06
3 2004-01-07 2004-01-09 2004-01-16 2004-01-23 2004-01-30 2004-02-06
4 2004-01-08 2004-01-09 2004-01-16 2004-01-23 2004-01-30 2004-02-06
this csv doesn't convert using either:
此 csv 不会使用以下任一方法进行转换:
dtd = pd.to_datetime(dtd, infer_datetime_format=True)
or
或者
dtd = pd.to_datetime(dtd, format='%Y-%m-%d')
It returns the value error above. Interestingly, however, using the parse_dates and infer_datetime_format as arguments of the read_csv method work fine. What is going on here?
它返回上面的值错误。然而,有趣的是,使用 parse_dates 和 infer_datetime_format 作为 read_csv 方法的参数工作正常。这里发生了什么?
采纳答案by piRSquared
You can stack
/ pd.to_datetime
/ unstack
你可以stack
/ pd.to_datetime
/unstack
pd.to_datetime(dte.stack()).unstack()
explanationpd.to_datetime
works on a string, list, or pd.Series
. dte
is a pd.DataFrame
and is why you are having issues. dte.stack()
produces a a pd.Series
where all rows are stacked on top of each other. However, in this stacked form, because it is a pd.Series
, I can get a vectorized pd.to_datetime
to work on it. the subsequent unstack
simply reverses the initial stack
to get the original form of dte
解释pd.to_datetime
适用于字符串、列表或pd.Series
. dte
是 apd.DataFrame
并且是您遇到问题的原因。dte.stack()
生成 aapd.Series
,其中所有行都堆叠在彼此的顶部。然而,在这种堆叠形式中,因为它是一个pd.Series
,我可以得到一个矢量化pd.to_datetime
来处理它。随后的unstack
简单地反转初始stack
以获得原始形式dte
回答by jezrael
For me works apply
function to_datetime
:
对我来说工作apply
功能to_datetime
:
print (dtd)
1 2 3 4 5 6
0
0 2004-01-02 2004-01-02 2004-01-09 2004-01-16 2004-01-23 2004-01-30
1 2004-01-05 2004-01-09 2004-01-16 2004-01-23 2004-01-30 2004-02-06
2 2004-01-06 2004-01-09 2004-01-16 2004-01-23 2004-01-30 2004-02-06
3 2004-01-07 2004-01-09 2004-01-16 2004-01-23 2004-01-30 2004-02-06
4 2004-01-08 2004-01-09 2004-01-16 2004-01-23 2004-01-30 2004-02-06
dtd = dtd.apply(pd.to_datetime)
print (dtd)
1 2 3 4 5 6
0
0 2004-01-02 2004-01-02 2004-01-09 2004-01-16 2004-01-23 2004-01-30
1 2004-01-05 2004-01-09 2004-01-16 2004-01-23 2004-01-30 2004-02-06
2 2004-01-06 2004-01-09 2004-01-16 2004-01-23 2004-01-30 2004-02-06
3 2004-01-07 2004-01-09 2004-01-16 2004-01-23 2004-01-30 2004-02-06
4 2004-01-08 2004-01-09 2004-01-16 2004-01-23 2004-01-30 2004-02-06
回答by Guilherme Fernandes Lopes
It works for me:
这个对我有用:
dtd.apply(lambda x: pd.to_datetime(x,errors = 'coerce', format = '%Y-%m-%d'))
This way you can use function attributes like above (errors and format). See more https://pandas.pydata.org/pandas-docs/stable/generated/pandas.to_datetime.html
这样你就可以使用上面的函数属性(错误和格式)。查看更多https://pandas.pydata.org/pandas-docs/stable/generated/pandas.to_datetime.html
回答by rishi jain
Just would like to add - errors = 'coerce' to avoid any errors / NULL values you might have
只想添加 - errors = 'coerce' 以避免您可能拥有的任何错误 / NULL 值
dtd = dtd.apply(pd.to_datetime(errors='coerce'))