在 Pandas 数据框中用 NaT 替换日期
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/24803824/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Replace date with NaT in Pandas dataframe
提问by user3527975
I have got a dataframe with a column of datetime64 type. In this column there are several rows with dates as 1999-09-09 23:59:59 where as they should have actually been represented as missing dates NaT. Somebody just decided to use this particular date to represent the missing data. Now I want these dates to be replaced as NaT (the missing date type for Pandas).
我有一个带有 datetime64 类型列的数据框。在此列中,有几行日期为 1999-09-09 23:59:59,因为它们实际上应该表示为缺失日期 NaT。有人刚刚决定使用这个特定日期来表示缺失的数据。现在我希望将这些日期替换为 NaT(Pandas 缺少的日期类型)。
Also if I perform operation on this column with NaTs, like
此外,如果我使用 NaT 对该列执行操作,例如
df['date'] - df['column with missing date']
Does Pandas ignore the missing dates and maintain NaT for those rows or will it throw an error some thing like Null pointer exception in Java.
Pandas 是否会忽略缺少的日期并为这些行维护 NaT,还是会抛出错误,例如 Java 中的 Null 指针异常。
采纳答案by EdChum
In [6]:
import pandas as pd
df = pd.DataFrame({'date':[pd.datetime(1999,9,9,23,59,59), pd.datetime(2014,1,1)]* 10})
df
Out[6]:
date
0 1999-09-09 23:59:59
1 2014-01-01 00:00:00
2 1999-09-09 23:59:59
3 2014-01-01 00:00:00
4 1999-09-09 23:59:59
5 2014-01-01 00:00:00
6 1999-09-09 23:59:59
7 2014-01-01 00:00:00
8 1999-09-09 23:59:59
9 2014-01-01 00:00:00
10 1999-09-09 23:59:59
11 2014-01-01 00:00:00
12 1999-09-09 23:59:59
13 2014-01-01 00:00:00
14 1999-09-09 23:59:59
15 2014-01-01 00:00:00
16 1999-09-09 23:59:59
17 2014-01-01 00:00:00
18 1999-09-09 23:59:59
19 2014-01-01 00:00:00
In [9]:
import numpy as np
df.loc[df['date'] == '1999-09-09 23:59:59 ', 'date'] = pd.NaT
df
Out[9]:
date
0 NaT
1 2014-01-01
2 NaT
3 2014-01-01
4 NaT
5 2014-01-01
6 NaT
7 2014-01-01
8 NaT
9 2014-01-01
10 NaT
11 2014-01-01
12 NaT
13 2014-01-01
14 NaT
15 2014-01-01
16 NaT
17 2014-01-01
18 NaT
19 2014-01-01
To answer your second question most pandas functions handle NaN's appropriately, you can always just drop them:
要回答您的第二个问题,大多数 Pandas 函数都会适当地处理 NaN,您可以随时删除它们:
In [10]:
df.dropna()
Out[10]:
date
1 2014-01-01
3 2014-01-01
5 2014-01-01
7 2014-01-01
9 2014-01-01
11 2014-01-01
13 2014-01-01
15 2014-01-01
17 2014-01-01
19 2014-01-01
and perform the operation just on these rows
并仅对这些行执行操作
回答by milcent
There are some operations, especially between columns, that do not disconsider NaNs or NaTs. That is why you are getting NaTs as a result.
If you want to disconsider the 1999-09-09 23:59:59 and also have a subtractable column, try to convert to NaTs and then swap the NaTs with zeros (.fillna(0)), so that, when subtracted, it will keep the value from the other column.
有一些操作,尤其是列之间的操作,不会不考虑 NaN 或 NaT。这就是为什么您会得到 NaT。如果您想不考虑 1999-09-09 23:59:59 并且还有一个可减列,请尝试转换为 NaTs,然后将 NaTs 与零 ( .fillna(0))交换,这样在减去时,它将保持值从另一列。

