pandas 熊猫 NaT 为 -1?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/14105774/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 20:33:38  来源:igfitidea点击:

Pandas NaT's to -1?

pythontimestamppandas

提问by ChrisArmstrong

In [22]: ts
Out[22]:
<class 'pandas.tseries.index.DatetimeIndex'>
[NaT, ..., 2012-12-31 00:00:00]
Length: 11, Freq: None, Timezone: None

In [23]: ts.year
Out[23]: array([  -1, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012])

This happens when using apply as well

使用 apply 时也会发生这种情况

ts.apply(lambda x: pd.Timestamp(x).year)

0       -1
1     2012
2     2012
3     2012
4     2012
5     2012
6     2012
7     2012
8     2012
9     2012
10    2012
Name: Dates

is it a bug that NaT.year == -1?

NaT.year == -1 是一个错误吗?

回答by abarnert

What makes you think this is a bug, rather than defined behavior?

是什么让您认为这是一个错误,而不是定义的行为?

First:

第一的:

In [16]: pandas.NaT.year
Out[16]: -1

So, there's nothing odd about it being in a DatetimeIndex; that's how NaTalways works.

所以,它在 a 中没有什么奇怪的DatetimeIndex;这是怎么NaT总是工作。

And it's entirely internally consistent, as well as consistent with lots of other stuff in numpyand elsewhere that uses -1 as a special value for (hopefully unsigned) integral types.

它是完全内部一致的,并且与numpy使用 -1 作为(希望是无符号的)整数类型的特殊值的其他地方和其他地方的许多内容一致。

Yes, -1 doesn't really work as a NaN, since you can do arithmetic with it and get non-NaN (and incorrect) results, and it does odd things in some other cases (try pandas.NaT.isoformat()), but what other option is there? As long as yearis defined to be some kind of numpyintegral type, it has to return an integral value. So, what are the options?

是的,-1 并不能真正用作 NaN,因为您可以用它进行算术运算并得到非 NaN(和不正确)的结果,并且在其他一些情况下它会做一些奇怪的事情(尝试pandas.NaT.isoformat()),但是还有什么其他选择? 只要year被定义为某种numpy整数类型,它就必须返回一个整数值。那么,有哪些选择呢?

  • Return either an intor None. Then calling yearreturns an array(dtype=object).
  • Return a float, so NaT.yearcan be NaN.
  • Raise an exception for NaT.yearitself, or when trying to do it within an array.
  • Return some special integer value. If not -1, what value would you use?
  • 返回 anintNone。然后调用year返回一个array(dtype=object).
  • 返回一个浮点数,所以NaT.year可以是NaN
  • NaT.year自身引发异常,或尝试在array.
  • 返回一些特殊的整数值。如果不是 -1,你会使用什么值?

They all suck in different ways, but the last seems to suck least, and be the most consistent with everything else in the universe. The ideal solution might be to have integer-with-NaN types in numpy, but that's a much larger issue that designing a wrapper around numpydatetimes…

它们都以不同的方式吸吮,但最后一种似乎吸吮最少,并且与宇宙中的其他一切事物最一致。理想的解决方案可能是在 中使用带有 NaN 的整数类型numpy,但这是一个更大的问题,因为设计围绕numpydatetimes的包装器......

By the way, it's worth noting that numpy1.6 doesn't have a NaT value for datetime64, so a pandas.NaTactually maps to datetime64(-1), for exactly the same reasons. Now that numpy1.7 has np.datetime64('NaT'), that could change. But that still doesn't change the fact that integers don't have a NaN.

顺便说一下,值得注意的是,numpy1.6 没有 的 NaT 值datetime64,因此 apandas.NaT实际上映射到datetime64(-1),原因完全相同。现在numpy1.7 有了np.datetime64('NaT'),这可能会改变。但这仍然不能改变整数没有 NaN 的事实。