在 Pandas 中保留 NaN 的同时删除重复项
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/23512339/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Drop duplicates while preserving NaNs in pandas
提问by bioslime
When using the drop_duplicates()method I reduce duplicates but also merge all NaNsinto one entry. How can I drop duplicates while preserving rows with an empty entry (like np.nan, None or '')?
使用该drop_duplicates()方法时,我减少了重复项,但也将所有NaNs项合并为一个条目。如何在保留带有空条目(如np.nan, None or '')的行的同时删除重复项?
import pandas as pd
df = pd.DataFrame({'col':['one','two',np.nan,np.nan,np.nan,'two','two']})
Out[]:
col
0 one
1 two
2 NaN
3 NaN
4 NaN
5 two
6 two
df.drop_duplicates(['col'])
Out[]:
col
0 one
1 two
2 NaN
回答by user666
Try
尝试
df[(~df.duplicated()) | (df['col'].isnull())]
The result is :
结果是:
col
0 one
1 two
2 NaN
3 NaN
4 NaN
回答by FooBar
Well, one workaround that is not really beautiful is to first save the NaNand put them back in:
好吧,一种不太美观的解决方法是先保存NaN并将它们放回原处:
temp = df.iloc[pd.isnull(df).any(1).nonzero()[0]]
asd = df.drop_duplicates('col')
pd.merge(temp, asd, how='outer')
Out[81]:
col
0 one
1 two
2 NaN
3 NaN
4 NaN

