Python Pandas - 使用 .isnull()、notnull()、dropna() 删除丢失数据的行不起作用

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/39339935/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 22:09:30  来源:igfitidea点击:

Pandas - dropping rows with missing data not working using .isnull(), notnull(), dropna()

pythonpandas

提问by durbachit

This is really weird. I have tried several ways of dropping rows with missing data from a pandas dataframe, but none of them seem to work. This is the code (I just uncomment one of the methods used - but these are the three that I used in different modifications - this is the latest):

这真的很奇怪。我尝试了几种方法来从熊猫数据框中删除缺少数据的行,但它们似乎都不起作用。这是代码(我只是取消注释使用的方法之一 - 但这是我在不同修改中使用的三个 - 这是最新的):

import pandas as pd
Test = pd.DataFrame({'A':[1,2,3,4,5],'B':[1,2,'NaN',4,5],'C':[1,2,3,'NaT',5]})
print(Test)
#Test = Test.ix[Test.C.notnull()]
#Test = Test.dropna()
Test = Test[~Test[Test.columns.values].isnull()]
print "And now"
print(Test)

But in all cases, all I get is this:

但在所有情况下,我得到的是:

   A    B    C
0  1    1    1
1  2    2    2
2  3  NaN    3
3  4    4  NaT
4  5    5    5
And now
   A    B    C
0  1    1    1
1  2    2    2
2  3  NaN    3
3  4    4  NaT
4  5    5    5

Is there any mistake that I am making? or what is the problem? Ideally, I would like to get this:

我犯了什么错误吗?或者有什么问题?理想情况下,我想得到这个:

   A    B    C
0  1    1    1
1  2    2    2
4  5    5    5

回答by Jon Clements

Your example DF has NaNand NaTas strings which .dropna, .notnulland co. won't consider falsey, so given your example you can use...

你的榜样DF具有NaNNaT作为字符串其中.dropna.notnull与合作。不会考虑虚假,所以给你的例子你可以使用......

df[~df.isin(['NaN', 'NaT']).any(axis=1)]

Which gives you:

这给了你:

   A  B  C
0  1  1  1
1  2  2  2
4  5  5  5

If you had a DF such as (note of the use of np.nanand np.datetime64('NaT')instead of strings:

如果您有一个 DF,例如(注意使用np.nannp.datetime64('NaT')代替字符串:

df = pd.DataFrame({'A':[1,2,3,4,5],'B':[1,2,np.nan,4,5],'C':[1,2,3,np.datetime64('NaT'),5]})

Then running df.dropna()which give you:

然后运行df.dropna()它给你:

   A    B  C
0  1  1.0  1
1  2  2.0  2
4  5  5.0  5

Note that column Bis now a floatinstead of an integer as that's required to store NaNvalues.

请注意, columnB现在是一个float而不是整数,因为这是存储NaN值所必需的。

回答by Merlin

Try this on orig data:

在原始数据上试试这个:

Test.replace(["NaN", 'NaT'], np.nan, inplace = True)
Test = Test.dropna()
Test

Or Modify data and do this

或修改数据并执行此操作

import pandas as pd
import numpy as np 

Test = pd.DataFrame({'A':[1,2,3,4,5],'B':[1,2,np.nan,4,5],'C':[1,2,3,pd.NaT,5]})
print(Test)
Test = Test.dropna()
print(Test)



   A    B  C
0  1  1.0  1
1  2  2.0  2
4  5  5.0  5