Pandas drop_duplicates - 类型错误:* 后的类型对象参数必须是序列,而不是映射

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/37792999/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 01:22:43  来源:igfitidea点击:

Pandas drop_duplicates - TypeError: type object argument after * must be a sequence, not map

pythonpandasdataframe

提问by user3939059

I have updated my question to provide a clearer example.

我已经更新了我的问题以提供更清晰的示例。

Is it possible to use the drop_duplicates method in Pandas to remove duplicate rows based on a column id where the values contain a list. Consider column 'three' which consists of two items in a list. Is there a way to drop the duplicate rows rather than doing it iteratively (which is my current workaround).

是否可以使用 Pandas 中的 drop_duplicates 方法根据值包含列表的列 id 删除重复的行。考虑由列表中的两个项目组成的“三”列。有没有办法删除重复的行而不是反复执行(这是我目前的解决方法)。

I have outlined my problem by providing the following example:

我通过提供以下示例概述了我的问题:

import pandas as pd

data = [
{'one': 50, 'two': '5:00', 'three': 'february'}, 
{'one': 25, 'two': '6:00', 'three': ['february', 'january']},
{'one': 25, 'two': '6:00', 'three': ['february', 'january']},
{'one': 25, 'two': '6:00', 'three': ['february', 'january']},
{'one': 90, 'two': '9:00', 'three': 'january'}
]

df = pd.DataFrame(data)

print(df)

   one                three   two
0   50             february  5:00
1   25  [february, january]  6:00
2   25  [february, january]  6:00
3   25  [february, january]  6:00
4   90              january  9:00

df.drop_duplicates(['three'])

Results in the following error:

导致以下错误:

TypeError: type object argument after * must be a sequence, not map

回答by Matthew

I think it's because the list type isn't hashable and that's messing up the duplicated logic. As a workaround you could cast to tuple like so:

我认为这是因为列表类型不可散列,这会弄乱重复的逻辑。作为一种解决方法,您可以像这样转换为元组:

df['four'] = df['three'].apply(lambda x : tuple(x) if type(x) is list else x)
df.drop_duplicates('four')

   one                three   two                 four
0   50             february  5:00             february
1   25  [february, january]  6:00  (february, january)
4   90              january  9:00              january