在 Pandas 中保留 NaN 的同时删除重复项

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/23512339/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 22:01:00  来源:igfitidea点击:

Drop duplicates while preserving NaNs in pandas

pythonpandas

提问by bioslime

When using the drop_duplicates()method I reduce duplicates but also merge all NaNsinto one entry. How can I drop duplicates while preserving rows with an empty entry (like np.nan, None or '')?

使用该drop_duplicates()方法时,我减少了重复项,但也将所有NaNs项合并为一个条目。如何在保留带有空条目(如np.nan, None or '')的行的同时删除重复项?

import pandas as pd
df = pd.DataFrame({'col':['one','two',np.nan,np.nan,np.nan,'two','two']})

Out[]: 
   col
0  one
1  two
2  NaN
3  NaN
4  NaN
5  two
6  two


df.drop_duplicates(['col'])

Out[]: 
   col
0  one
1  two
2  NaN

回答by user666

Try

尝试

df[(~df.duplicated()) | (df['col'].isnull())]

The result is :

结果是:

col
0   one
1   two
2   NaN
3   NaN     
4   NaN

回答by FooBar

Well, one workaround that is not really beautiful is to first save the NaNand put them back in:

好吧,一种不太美观的解决方法是先保存NaN并将它们放回原处:

temp = df.iloc[pd.isnull(df).any(1).nonzero()[0]]
asd = df.drop_duplicates('col')
pd.merge(temp, asd, how='outer')
Out[81]: 
   col
0  one
1  two
2  NaN
3  NaN
4  NaN