Python Pandas - 使用 .isnull()、notnull()、dropna() 删除丢失数据的行不起作用
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/39339935/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas - dropping rows with missing data not working using .isnull(), notnull(), dropna()
提问by durbachit
This is really weird. I have tried several ways of dropping rows with missing data from a pandas dataframe, but none of them seem to work. This is the code (I just uncomment one of the methods used - but these are the three that I used in different modifications - this is the latest):
这真的很奇怪。我尝试了几种方法来从熊猫数据框中删除缺少数据的行,但它们似乎都不起作用。这是代码(我只是取消注释使用的方法之一 - 但这是我在不同修改中使用的三个 - 这是最新的):
import pandas as pd
Test = pd.DataFrame({'A':[1,2,3,4,5],'B':[1,2,'NaN',4,5],'C':[1,2,3,'NaT',5]})
print(Test)
#Test = Test.ix[Test.C.notnull()]
#Test = Test.dropna()
Test = Test[~Test[Test.columns.values].isnull()]
print "And now"
print(Test)
But in all cases, all I get is this:
但在所有情况下,我得到的是:
A B C
0 1 1 1
1 2 2 2
2 3 NaN 3
3 4 4 NaT
4 5 5 5
And now
A B C
0 1 1 1
1 2 2 2
2 3 NaN 3
3 4 4 NaT
4 5 5 5
Is there any mistake that I am making? or what is the problem? Ideally, I would like to get this:
我犯了什么错误吗?或者有什么问题?理想情况下,我想得到这个:
A B C
0 1 1 1
1 2 2 2
4 5 5 5
回答by Jon Clements
Your example DF has NaN
and NaT
as strings which .dropna
, .notnull
and co. won't consider falsey, so given your example you can use...
你的榜样DF具有NaN
与NaT
作为字符串其中.dropna
,.notnull
与合作。不会考虑虚假,所以给你的例子你可以使用......
df[~df.isin(['NaN', 'NaT']).any(axis=1)]
Which gives you:
这给了你:
A B C
0 1 1 1
1 2 2 2
4 5 5 5
If you had a DF such as (note of the use of np.nan
and np.datetime64('NaT')
instead of strings:
如果您有一个 DF,例如(注意使用np.nan
和np.datetime64('NaT')
代替字符串:
df = pd.DataFrame({'A':[1,2,3,4,5],'B':[1,2,np.nan,4,5],'C':[1,2,3,np.datetime64('NaT'),5]})
Then running df.dropna()
which give you:
然后运行df.dropna()
它给你:
A B C
0 1 1.0 1
1 2 2.0 2
4 5 5.0 5
Note that column B
is now a float
instead of an integer as that's required to store NaN
values.
请注意, columnB
现在是一个float
而不是整数,因为这是存储NaN
值所必需的。
回答by Merlin
Try this on orig data:
在原始数据上试试这个:
Test.replace(["NaN", 'NaT'], np.nan, inplace = True)
Test = Test.dropna()
Test
Or Modify data and do this
或修改数据并执行此操作
import pandas as pd
import numpy as np
Test = pd.DataFrame({'A':[1,2,3,4,5],'B':[1,2,np.nan,4,5],'C':[1,2,3,pd.NaT,5]})
print(Test)
Test = Test.dropna()
print(Test)
A B C
0 1 1.0 1
1 2 2.0 2
4 5 5.0 5