在 Pandas 数据框中用 NaT 替换日期

Question

提问by user3527975

I have got a dataframe with a column of datetime64 type. In this column there are several rows with dates as 1999-09-09 23:59:59 where as they should have actually been represented as missing dates NaT. Somebody just decided to use this particular date to represent the missing data. Now I want these dates to be replaced as NaT (the missing date type for Pandas).

我有一个带有 datetime64 类型列的数据框。在此列中，有几行日期为 1999-09-09 23:59:59，因为它们实际上应该表示为缺失日期 NaT。有人刚刚决定使用这个特定日期来表示缺失的数据。现在我希望将这些日期替换为 NaT（Pandas 缺少的日期类型）。

Also if I perform operation on this column with NaTs, like

此外，如果我使用 NaT 对该列执行操作，例如

df['date'] - df['column with missing date']

Does Pandas ignore the missing dates and maintain NaT for those rows or will it throw an error some thing like Null pointer exception in Java.

Pandas 是否会忽略缺少的日期并为这些行维护 NaT，还是会抛出错误，例如 Java 中的 Null 指针异常。

Answer 1

采纳答案by EdChum

In [6]:
import pandas as pd
df = pd.DataFrame({'date':[pd.datetime(1999,9,9,23,59,59), pd.datetime(2014,1,1)]* 10})
df
Out[6]:
                  date
0  1999-09-09 23:59:59
1  2014-01-01 00:00:00
2  1999-09-09 23:59:59
3  2014-01-01 00:00:00
4  1999-09-09 23:59:59
5  2014-01-01 00:00:00
6  1999-09-09 23:59:59
7  2014-01-01 00:00:00
8  1999-09-09 23:59:59
9  2014-01-01 00:00:00
10 1999-09-09 23:59:59
11 2014-01-01 00:00:00
12 1999-09-09 23:59:59
13 2014-01-01 00:00:00
14 1999-09-09 23:59:59
15 2014-01-01 00:00:00
16 1999-09-09 23:59:59
17 2014-01-01 00:00:00
18 1999-09-09 23:59:59
19 2014-01-01 00:00:00
In [9]:

import numpy as np
df.loc[df['date'] == '1999-09-09 23:59:59 ', 'date'] = pd.NaT
df
Out[9]:
         date
0         NaT
1  2014-01-01
2         NaT
3  2014-01-01
4         NaT
5  2014-01-01
6         NaT
7  2014-01-01
8         NaT
9  2014-01-01
10        NaT
11 2014-01-01
12        NaT
13 2014-01-01
14        NaT
15 2014-01-01
16        NaT
17 2014-01-01
18        NaT
19 2014-01-01

To answer your second question most pandas functions handle NaN's appropriately, you can always just drop them:

要回答您的第二个问题，大多数 Pandas 函数都会适当地处理 NaN，您可以随时删除它们：

In [10]:

df.dropna()
Out[10]:
         date
1  2014-01-01
3  2014-01-01
5  2014-01-01
7  2014-01-01
9  2014-01-01
11 2014-01-01
13 2014-01-01
15 2014-01-01
17 2014-01-01
19 2014-01-01

and perform the operation just on these rows

并仅对这些行执行操作

Answer 2

回答by milcent

There are some operations, especially between columns, that do not disconsider NaNs or NaTs. That is why you are getting NaTs as a result. If you want to disconsider the 1999-09-09 23:59:59 and also have a subtractable column, try to convert to NaTs and then swap the NaTs with zeros (.fillna(0)), so that, when subtracted, it will keep the value from the other column.

有一些操作，尤其是列之间的操作，不会不考虑 NaN 或 NaT。这就是为什么您会得到 NaT。如果您想不考虑 1999-09-09 23:59:59 并且还有一个可减列，请尝试转换为 NaTs，然后将 NaTs 与零 ( .fillna(0))交换，这样在减去时，它将保持值从另一列。

在 Pandas 数据框中用 NaT 替换日期

提问by user3527975

采纳答案by EdChum

回答by milcent

相关推荐

最近更新

标签

在 Pandas 数据框中用 NaT 替换日期

提问by user3527975

采纳答案by EdChum

回答by milcent

相关推荐

pandas 计算 DataFrame 每一行中系列中项目的出现次数

pandas 两个 Series 对象的布尔比较

从 Pandas 数据框中删除 NaT 值

Python pandas：如何按组运行多个单变量回归

相关推荐

最近更新

标签