pandas 熊猫 NaT 为 -1?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/14105774/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas NaT's to -1?
提问by ChrisArmstrong
In [22]: ts
Out[22]:
<class 'pandas.tseries.index.DatetimeIndex'>
[NaT, ..., 2012-12-31 00:00:00]
Length: 11, Freq: None, Timezone: None
In [23]: ts.year
Out[23]: array([ -1, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012])
This happens when using apply as well
使用 apply 时也会发生这种情况
ts.apply(lambda x: pd.Timestamp(x).year)
0 -1
1 2012
2 2012
3 2012
4 2012
5 2012
6 2012
7 2012
8 2012
9 2012
10 2012
Name: Dates
is it a bug that NaT.year == -1?
NaT.year == -1 是一个错误吗?
回答by abarnert
What makes you think this is a bug, rather than defined behavior?
是什么让您认为这是一个错误,而不是定义的行为?
First:
第一的:
In [16]: pandas.NaT.year
Out[16]: -1
So, there's nothing odd about it being in a DatetimeIndex; that's how NaTalways works.
所以,它在 a 中没有什么奇怪的DatetimeIndex;这是怎么NaT总是工作。
And it's entirely internally consistent, as well as consistent with lots of other stuff in numpyand elsewhere that uses -1 as a special value for (hopefully unsigned) integral types.
它是完全内部一致的,并且与numpy使用 -1 作为(希望是无符号的)整数类型的特殊值的其他地方和其他地方的许多内容一致。
Yes, -1 doesn't really work as a NaN, since you can do arithmetic with it and get non-NaN (and incorrect) results, and it does odd things in some other cases (try pandas.NaT.isoformat()), but what other option is there? As long as yearis defined to be some kind of numpyintegral type, it has to return an integral value. So, what are the options?
是的,-1 并不能真正用作 NaN,因为您可以用它进行算术运算并得到非 NaN(和不正确)的结果,并且在其他一些情况下它会做一些奇怪的事情(尝试pandas.NaT.isoformat()),但是还有什么其他选择? 只要year被定义为某种numpy整数类型,它就必须返回一个整数值。那么,有哪些选择呢?
- Return either an
intorNone. Then callingyearreturns anarray(dtype=object). - Return a float, so
NaT.yearcan beNaN. - Raise an exception for
NaT.yearitself, or when trying to do it within anarray. - Return some special integer value. If not -1, what value would you use?
- 返回 an
int或None。然后调用year返回一个array(dtype=object). - 返回一个浮点数,所以
NaT.year可以是NaN。 - 为
NaT.year自身引发异常,或尝试在array. - 返回一些特殊的整数值。如果不是 -1,你会使用什么值?
They all suck in different ways, but the last seems to suck least, and be the most consistent with everything else in the universe. The ideal solution might be to have integer-with-NaN types in numpy, but that's a much larger issue that designing a wrapper around numpydatetimes…
它们都以不同的方式吸吮,但最后一种似乎吸吮最少,并且与宇宙中的其他一切事物最一致。理想的解决方案可能是在 中使用带有 NaN 的整数类型numpy,但这是一个更大的问题,因为设计围绕numpydatetimes的包装器......
By the way, it's worth noting that numpy1.6 doesn't have a NaT value for datetime64, so a pandas.NaTactually maps to datetime64(-1), for exactly the same reasons. Now that numpy1.7 has np.datetime64('NaT'), that could change. But that still doesn't change the fact that integers don't have a NaN.
顺便说一下,值得注意的是,numpy1.6 没有 的 NaT 值datetime64,因此 apandas.NaT实际上映射到datetime64(-1),原因完全相同。现在numpy1.7 有了np.datetime64('NaT'),这可能会改变。但这仍然不能改变整数没有 NaN 的事实。

