pandas 熊猫 drop_duplicates 方法不起作用
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/43855462/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas drop_duplicates method not working
提问by SLack A
I am trying to use drop_duplicates method on my dataframe, but I am getting an error. See the following:
我试图在我的数据帧上使用 drop_duplicates 方法,但出现错误。请参阅以下内容:
error: TypeError: unhashable type: 'list'
错误:类型错误:不可散列类型:“列表”
The code I am using:
我正在使用的代码:
df = db.drop_duplicates()
My DB is huge and contains strings, floats, dates, NaN's, booleans, integers... Any help is appreciated.
我的数据库很大,包含字符串、浮点数、日期、NaN、布尔值、整数...任何帮助表示赞赏。
回答by Allen
drop_duplicates won't work with lists in your dataframe as the error message implies. However, you can drop duplicates on the dataframe casted as str and then extract the rows from original df using the index from the results.
正如错误消息所暗示的那样,drop_duplicates 不适用于您的数据框中的列表。但是,您可以删除转换为 str 的数据帧上的重复项,然后使用结果中的索引从原始 df 中提取行。
Setup
设置
df = pd.DataFrame({'Keyword': {0: 'apply', 1: 'apply', 2: 'apply', 3: 'terms', 4: 'terms'},
'X': {0: [1, 2], 1: [1, 2], 2: 'xy', 3: 'xx', 4: 'yy'},
'Y': {0: 'yy', 1: 'yy', 2: 'yx', 3: 'ix', 4: 'xi'}})
#Drop directly causes the same error
df.drop_duplicates()
Traceback (most recent call last):
...
TypeError: unhashable type: 'list'
Solution
解决方案
#convert hte df to str type, drop duplicates and then select the rows from original df.
df.loc[df.astype(str).drop_duplicates().index]
Out[205]:
Keyword X Y
0 apply [1, 2] yy
2 apply xy yx
3 terms xx ix
4 terms yy xi
#the list elements are still list in the final results.
df.loc[df.astype(str).drop_duplicates().index].loc[0,'X']
Out[207]: [1, 2]
Edit: replaced iloc with loc. In this particular case, both work as the index matches the positional index, but it is not general
编辑:用 loc 替换 iloc。在这种特殊情况下,两者都在索引与位置索引匹配时起作用,但并不通用
回答by Hsgao
@Allen's answer is great, but have a little problem.
@Allen 的回答很好,但有一个小问题。
df.iloc[df.astype(str).drop_duplicates().index]
it should be loc not iloc.loot at the example.
在示例中,它应该是 loc 而不是 iloc.loot。
a = pd.DataFrame([['a',18],['b',11],['a',18]],index=[4,6,8])
Out[52]:
0 1
4 a 18
6 b 11
8 a 18
a.iloc[a.astype(str).drop_duplicates().index]
Out[53]:
...
IndexError: positional indexers are out-of-bounds
a.loc[a.astype(str).drop_duplicates().index]
Out[54]:
0 1
4 a 18
6 b 11