在Dataframe python的列中使用NaT过滤所有行

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/23747451/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 03:25:30  来源:igfitidea点击:

Filtering all rows with NaT in a column in Dataframe python

pythonpandasdataframe

提问by Jase Villam

I have a df like this:

我有一个这样的 df:

    a b           c
    1 NaT         w
    2 2014-02-01  g
    3 NaT         x   

    df=df[df.b=='2014-02-01']

will give me

会给我

    a  b          c
    2 2014-02-01  g

I want a database of all rows with NaT in column b?

我想要一个在 b 列中包含 NaT 的所有行的数据库?

   df=df[df.b==None] #Doesn't work

I want this:

我要这个:

    a b           c
    1 NaT         w
    3 NaT         x    

采纳答案by Karl D.

isnulland notnullwork with NaTso you can handle them much the same way you handle NaNs:

isnullnotnull与之合作,NaT以便您可以像处理它们一样处理它们NaNs

>>> df

   a          b  c
0  1        NaT  w
1  2 2014-02-01  g
2  3        NaT  x

>>> df.dtypes

a             int64
b    datetime64[ns]
c            object

just use isnullto select:

只需用于isnull选择:

df[df.b.isnull()]

   a   b  c
0  1 NaT  w
2  3 NaT  x

回答by Radu

Using your example dataframe:

使用您的示例数据框:

df = pd.DataFrame({"a":[1,2,3], 
                   "b":[pd.NaT, pd.to_datetime("2014-02-01"), pd.NaT], 
                   "c":["w", "g", "x"]})

Until v0.17 this didn't use to work:

在 v0.17 之前,这不能正常工作:

df.query('b != b') 

and you had to do:

你必须这样做:

df.query('b == "NaT"') # yes, surprisingly, this works!

Since v0.17 though, both methods work, although I would only recommend the first one.

不过,从 v0.17 开始,这两种方法都有效,尽管我只推荐第一种。

回答by Eelco van Vliet

For those interested, in my case I wanted to drop the NaT contained in the DateTimeIndex of a dataframe. I could not directly use the notnull construction as suggested by Karl D. You first have to create a temporary column out of the index, then apply the mask, and then delete the temporary column again.

对于那些感兴趣的人,就我而言,我想删除数据帧的 DateTimeIndex 中包含的 NaT。我不能直接使用 Karl D 建议的 notnull 构造。您首先必须从索引中创建一个临时列,然后应用掩码,然后再次删除临时列。

df["TMP"] = df.index.values                # index is a DateTimeIndex
df = df[df.TMP.notnull()]                  # remove all NaT values
df.drop(["TMP"], axis=1, inplace=True)     # delete TMP again

回答by Michael Dorner

I feel that the comment by @DSM is worth a answer on its own, because this answers the fundamental question.

我觉得@DSM 的评论本身就值得一个答案,因为这回答了基本问题。

The misunderstanding comes from the assumption that pd.NaTacts like None. However, while None == Nonereturns True, pd.NaT == pd.NaTreturns False. Pandas NaTbehaves like a floating-point NaN, which is not equal to itself.

误解来自于pd.NaT行为类似于的假设None。然而,虽然None == None返回Truepd.NaT == pd.NaT返回False。Pandas 的NaT行为就像一个浮点数NaN,它不等于自身。

As the previous answer explain, you should use

正如前面的答案所解释的那样,您应该使用

df[df.b.isnull()] # or notnull(), respectively