在Dataframe python的列中使用NaT过滤所有行
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/23747451/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Filtering all rows with NaT in a column in Dataframe python
提问by Jase Villam
I have a df like this:
我有一个这样的 df:
a b c
1 NaT w
2 2014-02-01 g
3 NaT x
df=df[df.b=='2014-02-01']
will give me
会给我
a b c
2 2014-02-01 g
I want a database of all rows with NaT in column b?
我想要一个在 b 列中包含 NaT 的所有行的数据库?
df=df[df.b==None] #Doesn't work
I want this:
我要这个:
a b c
1 NaT w
3 NaT x
采纳答案by Karl D.
isnull
and notnull
work with NaT
so you can handle them much the same way you handle NaNs
:
isnull
并notnull
与之合作,NaT
以便您可以像处理它们一样处理它们NaNs
:
>>> df
a b c
0 1 NaT w
1 2 2014-02-01 g
2 3 NaT x
>>> df.dtypes
a int64
b datetime64[ns]
c object
just use isnull
to select:
只需用于isnull
选择:
df[df.b.isnull()]
a b c
0 1 NaT w
2 3 NaT x
回答by Radu
Using your example dataframe:
使用您的示例数据框:
df = pd.DataFrame({"a":[1,2,3],
"b":[pd.NaT, pd.to_datetime("2014-02-01"), pd.NaT],
"c":["w", "g", "x"]})
Until v0.17 this didn't use to work:
在 v0.17 之前,这不能正常工作:
df.query('b != b')
and you had to do:
你必须这样做:
df.query('b == "NaT"') # yes, surprisingly, this works!
Since v0.17 though, both methods work, although I would only recommend the first one.
不过,从 v0.17 开始,这两种方法都有效,尽管我只推荐第一种。
回答by Eelco van Vliet
For those interested, in my case I wanted to drop the NaT contained in the DateTimeIndex of a dataframe. I could not directly use the notnull construction as suggested by Karl D. You first have to create a temporary column out of the index, then apply the mask, and then delete the temporary column again.
对于那些感兴趣的人,就我而言,我想删除数据帧的 DateTimeIndex 中包含的 NaT。我不能直接使用 Karl D 建议的 notnull 构造。您首先必须从索引中创建一个临时列,然后应用掩码,然后再次删除临时列。
df["TMP"] = df.index.values # index is a DateTimeIndex
df = df[df.TMP.notnull()] # remove all NaT values
df.drop(["TMP"], axis=1, inplace=True) # delete TMP again
回答by Michael Dorner
I feel that the comment by @DSM is worth a answer on its own, because this answers the fundamental question.
我觉得@DSM 的评论本身就值得一个答案,因为这回答了基本问题。
The misunderstanding comes from the assumption that pd.NaT
acts like None
. However, while None == None
returns True
, pd.NaT == pd.NaT
returns False
. Pandas NaT
behaves like a floating-point NaN
, which is not equal to itself.
误解来自于pd.NaT
行为类似于的假设None
。然而,虽然None == None
返回True
,pd.NaT == pd.NaT
返回False
。Pandas 的NaT
行为就像一个浮点数NaN
,它不等于自身。
As the previous answer explain, you should use
正如前面的答案所解释的那样,您应该使用
df[df.b.isnull()] # or notnull(), respectively