pandas 熊猫 drop_duplicates 方法不起作用

Question

提问by SLack A

I am trying to use drop_duplicates method on my dataframe, but I am getting an error. See the following:

我试图在我的数据帧上使用 drop_duplicates 方法，但出现错误。请参阅以下内容：

error: TypeError: unhashable type: 'list'

错误：类型错误：不可散列类型：“列表”

The code I am using:

我正在使用的代码：

df = db.drop_duplicates()

My DB is huge and contains strings, floats, dates, NaN's, booleans, integers... Any help is appreciated.

我的数据库很大，包含字符串、浮点数、日期、NaN、布尔值、整数...任何帮助表示赞赏。

Answer 1

回答by Allen

drop_duplicates won't work with lists in your dataframe as the error message implies. However, you can drop duplicates on the dataframe casted as str and then extract the rows from original df using the index from the results.

正如错误消息所暗示的那样，drop_duplicates 不适用于您的数据框中的列表。但是，您可以删除转换为 str 的数据帧上的重复项，然后使用结果中的索引从原始 df 中提取行。

Setup

设置

df = pd.DataFrame({'Keyword': {0: 'apply', 1: 'apply', 2: 'apply', 3: 'terms', 4: 'terms'},
 'X': {0: [1, 2], 1: [1, 2], 2: 'xy', 3: 'xx', 4: 'yy'},
 'Y': {0: 'yy', 1: 'yy', 2: 'yx', 3: 'ix', 4: 'xi'}})

#Drop directly causes the same error
df.drop_duplicates()
Traceback (most recent call last):
...
TypeError: unhashable type: 'list'

Solution

解决方案

#convert hte df to str type, drop duplicates and then select the rows from original df.

df.loc[df.astype(str).drop_duplicates().index]
Out[205]: 
  Keyword       X   Y
0   apply  [1, 2]  yy
2   apply      xy  yx
3   terms      xx  ix
4   terms      yy  xi

#the list elements are still list in the final results.
df.loc[df.astype(str).drop_duplicates().index].loc[0,'X']
Out[207]: [1, 2]

Edit: replaced iloc with loc. In this particular case, both work as the index matches the positional index, but it is not general

编辑：用 loc 替换 iloc。在这种特殊情况下，两者都在索引与位置索引匹配时起作用，但并不通用

Answer 2

回答by Hsgao

@Allen's answer is great, but have a little problem.

@Allen 的回答很好，但有一个小问题。

df.iloc[df.astype(str).drop_duplicates().index]

it should be loc not iloc.loot at the example.

在示例中，它应该是 loc 而不是 iloc.loot。

a = pd.DataFrame([['a',18],['b',11],['a',18]],index=[4,6,8])
Out[52]: 
   0   1
4  a  18
6  b  11
8  a  18

a.iloc[a.astype(str).drop_duplicates().index]
Out[53]:
...
IndexError: positional indexers are out-of-bounds

a.loc[a.astype(str).drop_duplicates().index]
Out[54]: 
   0   1
4  a  18
6  b  11

pandas 熊猫 drop_duplicates 方法不起作用

提问by SLack A

回答by Allen

回答by Hsgao

相关推荐

最近更新

标签

pandas 熊猫 drop_duplicates 方法不起作用

提问by SLack A

回答by Allen

回答by Hsgao

相关推荐

Pandas：to_csv() 得到了一个意外的关键字参数

pandas 来自枢轴的seaborn热图中的数据顺序

pandas 根据第 2 列的不同值获取行

从 Pandas DataFrame 创建时间序列

相关推荐

最近更新

标签