使用 Python 删除对象列表中的重复项

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/4169252/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 14:36:24  来源:igfitidea点击:

Remove duplicates in list of object with Python

pythonmysqlsqlobject

提问by imns

I've got a list of objects and I've got a db table full of records. My list of objects has a title attribute and I want to remove any objects with duplicate titles from the list (leaving the original).

我有一个对象列表,我有一个充满记录的数据库表。我的对象列表有一个 title 属性,我想从列表中删除任何具有重复标题的对象(保留原来的)。

Then I want to check if my list of objects has any duplicates of any records in the database and if so, remove those items from list before adding them to the database.

然后我想检查我的对象列表是否与数据库中的任何记录有任何重复,如果有,请在将它们添加到数据库之前从列表中删除这些项目。

I have seen solutions for removing duplicates from a list like this: myList = list(set(myList)), but i'm not sure how to do that with a list of objects?

我见过从这样的列表中删除重复项的解决方案:myList = list(set(myList)),但我不确定如何使用对象列表来做到这一点?

I need to maintain the order of my list of objects too. I was also thinking maybe I could use difflibto check for differences in the titles.

我也需要维护我的对象列表的顺序。我也在想也许我可以difflib用来检查标题的差异。

采纳答案by vonPetrushev

The set(list_of_objects)will only remove the duplicates if you know what a duplicate is, that is, you'll need to define a uniqueness of an object.

set(list_of_objects)如果您知道重复项是什么,则只会删除重复项,也就是说,您需要定义对象的唯一性。

In order to do that, you'll need to make the object hashable. You need to define both __hash__and __eq__method, here is how:

为此,您需要使对象可散列。您需要同时定义__hash____eq__方法,方法如下:

http://docs.python.org/glossary.html#term-hashable

http://docs.python.org/glossary.html#term-hashable

Though, you'll probably only need to define __eq__method.

不过,您可能只需要定义__eq__方法。

EDIT: How to implement the __eq__method:

编辑:如何实现该__eq__方法:

You'll need to know, as I mentioned, the uniqueness definition of your object. Supposed we have a Book with attributes author_name and title that their combination is unique, (so, we can have many books Stephen King authored, and many books named The Shining, but only one book named The Shining by Stephen King), then the implementation is as follows:

正如我提到的,您需要知道对象的唯一性定义。假设我们有一本书的属性 author_name 和 title 它们的组合是唯一的,(所以,我们可以有很多史蒂芬金的书,很多书叫闪灵,但只有一本斯蒂芬金的书叫闪灵),那么实现如下:

def __eq__(self, other):
    return self.author_name==other.author_name\
           and self.title==other.title

Similarly, this is how I sometimes implement the __hash__method:

同样,这就是我有时实现该__hash__方法的方式:

def __hash__(self):
    return hash(('title', self.title,
                 'author_name', self.author_name))

You can check that if you create a list of 2 books with same author and title, the book objects will be the same (with isoperator) andequal (with ==operator). Also, when set()is used, it will remove one book.

您可以检查,如果您创建了一个包含相同作者和标题的 2 本书的列表,则书籍对象将相同(使用is运算符)和相等(使用==运算符)。此外,当set()使用时,它将删除一本书。

EDIT: This is one old anwser of mine, but I only now notice that it has the error which is corrected with strikethrough in the last paragraph: objects with the same hash()won't give Truewhen compared with is. Hashability of object is used, however, if you intend to use them as elements of set, or as keys in dictionary.

编辑:这是我的一个老anwser,但我现在才注意到它有它在最后一段删除线更正错误:与相同的对象hash()不会放弃True的时候相比is。但是,如果您打算将它们用作集合的元素或字典中的键,则使用对象的散列性。

回答by aaronasterling

Since they're not hashable, you can't use a set directly. The titles should be though.

由于它们不可散列,因此您不能直接使用集合。标题应该是。

Here's the first part.

这是第一部分。

seen_titles = set()
new_list = []
for obj in myList:
    if obj.title not in seen_titles:
        new_list.append(obj)
        seen_titles.add(obj.title)

You're going to need to describe what database/ORM etc. you're using for the second part though.

您将需要描述您在第二部分使用的数据库/ORM 等。

回答by hughdbrown

This seems pretty minimal:

这似乎很小:

new_dict = dict()
for obj in myList:
    if obj.title not in new_dict:
        new_dict[obj.title] = obj

回答by Spiderman

Its quite easy freinds :-

它很容易朋友: -

a = [5,6,7,32,32,32,32,32,32,32,32]

a = list(set(a))

print (a)

a = [5,6,7,32,32,32,32,32,32,32,32]

a = 列表(集合(a))

打印(一)

[5,6,7,32]

thats it ! :)

就是这样 !:)

回答by Amir

If you want to preserve the original order use it:

如果您想保留原始订单,请使用它:

seen = {}
new_list = [seen.setdefault(x, x) for x in my_list if x not in seen]

If you don't care of ordering then use it:

如果您不关心订购,请使用它:

new_list = list(set(my_list))

回答by qwr

Both __hash__and __eq__are needed for this.

双方__hash____eq__都需要这个。

__hash__is needed to add an object to a set, since python's sets are implemented as hashtables. By default, immutable objects like numbers, strings, and tuples are hashable.

__hash__需要将对象添加到集合中,因为python 的集合是作为 hashtables 实现的。默认情况下,像数字、字符串和元组这样的不可变对象是可散列的。

However, hash collisions (two distinct objects hashing to the same value) are inevitable, due to the pigeonhole principle. So, two objects cannot be distinguished only using their hash, and the user must specify their own __eq__function. Thus, the actual hash function the user provides is not crucial, though it is best to try to avoid hash collisions for performance (see What's a correct and good way to implement __hash__()?).

然而,由于鸽巢原理,散列冲突(两个不同的对象散列到相同的值)是不可避免的。因此,不能仅使用哈希来区分两个对象,用户必须指定自己的__eq__函数。因此,用户提供的实际散列函数并不重要,尽管为了性能最好尽量避免散列冲突(请参阅什么是实现 __hash__() 的正确和好方法?)。

回答by binW

I recently ended up using the code below. It is similar to other answers as it iterates over the list and records what it is seeing and then removes any item that it has already seen but it doesn't create a duplicate list, instead it just deletes the item from original list.

我最近最终使用了下面的代码。它类似于其他答案,因为它遍历列表并记录它所看到的内容,然后删除它已经看到的任何项目,但它不会创建重复的列表,而只是从原始列表中删除该项目。

seen = {}
for obj in objList:
    if obj["key-property"] in seen.keys():
        objList.remove(obj)
    else:
        seen[obj["key-property"]] = 1