Python numpy:无法将 datetime64[ns] 转换为 datetime64[D](与 Numba 一起使用)

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/31917964/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 10:44:20  来源:igfitidea点击:

Python numpy: cannot convert datetime64[ns] to datetime64[D] (to use with Numba)

pythonnumpypandasnumba

提问by Pythonista anonymous

I want to pass a datetime array to a Numba function (which cannot be vectorised and would otherwise be very slow). I understand Numba supports numpy.datetime64. However, it seems it supports datetime64[D] (day precision) but not datetime64[ns] (millisecond precision) (I learnt this the hard way: is it documented?).

我想将一个日期时间数组传递给一个 Numba 函数(它不能被向量化,否则会很慢)。我知道 Numba 支持 numpy.datetime64。但是,它似乎支持 datetime64[D](日精度)但不支持 datetime64[ns](毫秒精度)(我以艰难的方式了解到这一点:是否有记录?)。

I tried to convert from datetime64[ns] to datetime64[D], but can't seem to find a way! Any ideas?

我试图从 datetime64[ns] 转换为 datetime64[D],但似乎找不到方法!有任何想法吗?

I have summarised my problem with the minimal code below. If you run testdf(mydates), which is datetime64[D], it works fine. If you run testdf(dates_input), which is datetime64[ns], it doesn't. Note that this example simply passes the dates to the Numba function, which doesn't (yet) do anything with them. I try to convert dates_input to datetime64[D], but the conversion doesn't work. In my original code I read from a SQL table into a pandas dataframe, and need a column which changes the day of each date to the 15th.

我用下面的最小代码总结了我的问题。如果你运行testdf(mydates),它是 datetime64[D],它工作正常。如果您运行testdf(dates_input),即 datetime64[ns],则不会。请注意,此示例只是将日期传递给 Numba 函数,该函数(尚未)对它们执行任何操作。我尝试将dates_input 转换为datetime64[D],但转换不起作用。在我的原始代码中,我从一个 SQL 表中读取到一个 Pandas 数据框,并且需要一个将每个日期更改为 15 日的列。

import numba
import numpy as np
import pandas as pd
import datetime

mydates =np.array(['2010-01-01','2011-01-02']).astype('datetime64[D]')
df=pd.DataFrame()
df["rawdate"]=mydates
df["month_15"] = df["rawdate"].apply(lambda r: datetime.date( r.year, r.month,15 ) )

dates_input = df["month_15"].astype('datetime64[D]')
print dates_input.dtype # Why datetime64[ns] and not datetime64[D] ??


@numba.jit(nopython=True)
def testf(dates):
    return 1

print testf(mydates)

The error I get if I run testdf(dates_input)is:

如果我运行,我得到的错误testdf(dates_input)是:

numba.typeinfer.TypingError: Failed at nopython (nopython frontend)
Var 'dates' unified to object: dates := {pyobject}

采纳答案by unutbu

Series.astypeconverts all date-like objects to datetime64[ns]. To convert to datetime64[D], use valuesto obtain a NumPy array before calling astype:

Series.astype将所有类似日期的对象转换为datetime64[ns]. 要转换为datetime64[D],请values在调用之前使用获取 NumPy 数组astype

dates_input = df["month_15"].values.astype('datetime64[D]')


Note that NDFrames (such as Series and DataFrames) can only hold datetime-like objects as objects of dtype datetime64[ns]. The automatic conversion of all datetime-likes to a common dtype simplifies subsequent date computations. But it makes it impossible to store, say, datetime64[s]objects in a DataFrame column. Pandas core developer, Jeff Reback explains,

请注意,NDFrames(例如 Series 和 DataFrames)只能将类似日期时间的对象保存为 dtype 对象datetime64[ns]。所有类似日期时间的自动转换为通用 dtype 简化了后续的日期计算。但是它使得无法datetime64[s]在 DataFrame 列中存储 对象。Pandas 核心开发人员Jeff Reback 解释说

"We don't allow direct conversions because its simply too complicated to keep anything other than datetime64[ns] internally (nor necessary at all)."

“我们不允许直接转换,因为它太复杂了,无法在内部保留除 datetime64[ns] 以外的任何内容(也根本没有必要)。”



Also note that even though df['month_15'].astype('datetime64[D]')has dtype datetime64[ns]:

还要注意,即使df['month_15'].astype('datetime64[D]')有 dtype datetime64[ns]

In [29]: df['month_15'].astype('datetime64[D]').dtype
Out[29]: dtype('<M8[ns]')

when you iterate through the items in the Series, you get pandas Timestamps, not datetime64[ns]s.

当您遍历系列中的项目时,您会得到 pandas Timestamps,而不是datetime64[ns]s。

In [28]: df['month_15'].astype('datetime64[D]').tolist()
Out[28]: [Timestamp('2010-01-15 00:00:00'), Timestamp('2011-01-15 00:00:00')]

Therefore, it is not clear that Numba actually has a problem with datetime64[ns], it might just have a problem with Timestamps. Sorry, I can't check this -- I don't have Numba installed.

因此,不清楚 Numba 实际上是否有 问题datetime64[ns],它可能只是 有问题Timestamps。抱歉,我无法检查 - 我没有安装 Numba。

However, it might be useful for you to try

但是,尝试一下可能对您有用

testf(df['month_15'].astype('datetime64[D]').values)

since df['month_15'].astype('datetime64[D]').valuesis truly a NumPy array of dtype datetime64[ns]:

因为df['month_15'].astype('datetime64[D]').values确实是一个 NumPy dtype 数组datetime64[ns]

In [31]: df['month_15'].astype('datetime64[D]').values.dtype
Out[31]: dtype('<M8[ns]')

If that works, then you don't have to convert everything to datetime64[D], you just have to pass NumPy arrays -- not Pandas Series -- to testf.

如果可行,那么您不必将所有内容都转换为datetime64[D],只需将 NumPy 数组(而不是 Pandas 系列)传递给testf

回答by Arthur D. Howland

Ran into the same error when calculating number of business days between two dates:

在计算两个日期之间的工作日数时遇到同样的错误:

from pandas.tseries.offsets import MonthBegin
import numpy as np 

# Calculate the beginning of the month from a given date
df['Month_Begin'] = pd.to_datetime(df['MyDateColumn'])+ MonthBegin(-1)

# Calculate # of Business Days
# Convert dates to string to prevent type error [D]
df['TS_Period_End_Date'] = df['TS_Period_End_Date'].dt.strftime('%Y-%m-%d')
df['Month_Begin'] = df['Month_Begin'].dt.strftime('%Y-%m-%d')

df['Biz_Days'] = np.busday_count(df['Month_Begin'], df['MyDateColumn']) #<-- Error if not converted into strings.

My workaround was to convert the dates using ".dt.strftime(''%Y-%m-%d')". It worked in my particular case.

我的解决方法是使用“.dt.strftime(''%Y-%m-%d')”转换日期。它适用于我的特殊情况。