to_datetime 值错误：至少必须指定 [年、月、日] Pandas

Question

提问by Jed

I am reading from two different CSVs each having date values in their columns. After read_csv I want to convert the data to datetime with the to_datetime method. The formats of the dates in each CSV are slightly different, and although the differences are noted and specified in the to_datetime format argument, the one converts fine, while the other returns the following value error.

我正在从两个不同的 CSV 中读取，每个 CSV 的列中都有日期值。在 read_csv 之后，我想使用 to_datetime 方法将数据转换为日期时间。每个 CSV 中的日期格式略有不同，虽然在 to_datetime 格式参数中指出并指定了差异，但一个转换正常，而另一个返回以下值错误。

ValueError: to assemble mappings requires at least that [year, month, day] be sp
ecified: [day,month,year] is missing

first dte.head()

第一个 dte.head()

0  10/14/2016  10/17/2016  10/19/2016    8/9/2016  10/17/2016   7/20/2016
1   7/15/2016   7/18/2016   7/20/2016    6/7/2016   7/18/2016   4/19/2016
2   4/15/2016   4/14/2016   4/18/2016   3/15/2016   4/18/2016   1/14/2016
3   1/15/2016   1/19/2016   1/19/2016  10/19/2015   1/19/2016  10/13/2015
4  10/15/2015  10/14/2015  10/19/2015   7/23/2015  10/14/2015   7/15/2015

this dataframe converts fine using the following code:

使用以下代码可以很好地转换此数据框：

dte = pd.to_datetime(dte, infer_datetime_format=True)

or

或者

dte = pd.to_datetime(dte[x], format='%m/%d/%Y')

the second dtd.head()

第二个 dtd.head()

0   2004-01-02 2004-01-02  2004-01-09 2004-01-16  2004-01-23  2004-01-30
1   2004-01-05 2004-01-09  2004-01-16 2004-01-23  2004-01-30  2004-02-06
2   2004-01-06 2004-01-09  2004-01-16 2004-01-23  2004-01-30  2004-02-06
3   2004-01-07 2004-01-09  2004-01-16 2004-01-23  2004-01-30  2004-02-06
4   2004-01-08 2004-01-09  2004-01-16 2004-01-23  2004-01-30  2004-02-06

this csv doesn't convert using either:

此 csv 不会使用以下任一方法进行转换：

dtd = pd.to_datetime(dtd, infer_datetime_format=True)

or

或者

dtd = pd.to_datetime(dtd, format='%Y-%m-%d')

It returns the value error above. Interestingly, however, using the parse_dates and infer_datetime_format as arguments of the read_csv method work fine. What is going on here?

它返回上面的值错误。然而，有趣的是，使用 parse_dates 和 infer_datetime_format 作为 read_csv 方法的参数工作正常。这里发生了什么？

Answer 1

采纳答案by piRSquared

You can stack/ pd.to_datetime/ unstack

你可以stack/ pd.to_datetime/unstack

pd.to_datetime(dte.stack()).unstack()

explanation
pd.to_datetimeworks on a string, list, or pd.Series. dteis a pd.DataFrameand is why you are having issues. dte.stack()produces a a pd.Serieswhere all rows are stacked on top of each other. However, in this stacked form, because it is a pd.Series, I can get a vectorized pd.to_datetimeto work on it. the subsequent unstacksimply reverses the initial stackto get the original form of dte

解释
pd.to_datetime适用于字符串、列表或pd.Series. dte是 apd.DataFrame并且是您遇到问题的原因。dte.stack()生成 aapd.Series，其中所有行都堆叠在彼此的顶部。然而，在这种堆叠形式中，因为它是一个pd.Series，我可以得到一个矢量化pd.to_datetime来处理它。随后的unstack简单地反转初始stack以获得原始形式dte

Answer 2

回答by jezrael

For me works applyfunction to_datetime:

对我来说工作apply功能to_datetime：

print (dtd)
            1           2           3           4           5           6
0                                                                        
0  2004-01-02  2004-01-02  2004-01-09  2004-01-16  2004-01-23  2004-01-30
1  2004-01-05  2004-01-09  2004-01-16  2004-01-23  2004-01-30  2004-02-06
2  2004-01-06  2004-01-09  2004-01-16  2004-01-23  2004-01-30  2004-02-06
3  2004-01-07  2004-01-09  2004-01-16  2004-01-23  2004-01-30  2004-02-06
4  2004-01-08  2004-01-09  2004-01-16  2004-01-23  2004-01-30  2004-02-06


dtd = dtd.apply(pd.to_datetime)

print (dtd)
           1          2          3          4          5          6
0                                                                  
0 2004-01-02 2004-01-02 2004-01-09 2004-01-16 2004-01-23 2004-01-30
1 2004-01-05 2004-01-09 2004-01-16 2004-01-23 2004-01-30 2004-02-06
2 2004-01-06 2004-01-09 2004-01-16 2004-01-23 2004-01-30 2004-02-06
3 2004-01-07 2004-01-09 2004-01-16 2004-01-23 2004-01-30 2004-02-06
4 2004-01-08 2004-01-09 2004-01-16 2004-01-23 2004-01-30 2004-02-06

Answer 3

回答by Guilherme Fernandes Lopes

It works for me:

这个对我有用：

dtd.apply(lambda x: pd.to_datetime(x,errors = 'coerce', format = '%Y-%m-%d'))

This way you can use function attributes like above (errors and format). See more https://pandas.pydata.org/pandas-docs/stable/generated/pandas.to_datetime.html

这样你就可以使用上面的函数属性（错误和格式）。查看更多https://pandas.pydata.org/pandas-docs/stable/generated/pandas.to_datetime.html

Answer 4

回答by rishi jain

Just would like to add - errors = 'coerce' to avoid any errors / NULL values you might have

只想添加 - errors = 'coerce' 以避免您可能拥有的任何错误 / NULL 值

dtd = dtd.apply(pd.to_datetime(errors='coerce'))

to_datetime 值错误：至少必须指定 [年、月、日] Pandas

提问by Jed

采纳答案by piRSquared

回答by jezrael

回答by Guilherme Fernandes Lopes

回答by rishi jain

相关推荐

最近更新

标签

to_datetime 值错误：至少必须指定 [年、月、日] Pandas

提问by Jed

采纳答案by piRSquared

回答by jezrael

回答by Guilherme Fernandes Lopes

回答by rishi jain

相关推荐

pandas Python 3 statsmodels Logit ValueError：在进入 DLASCL 参数编号 5 时有一个非法值

Pandas：创建没有按字母顺序自动排序列名的数据框

将 Pandas 转换为 Spark 时出现类型错误

Pandas：获取系列的前 10 个元素

相关推荐

最近更新

标签