使用 Python 删除对象列表中的重复项

Question

提问by imns

I've got a list of objects and I've got a db table full of records. My list of objects has a title attribute and I want to remove any objects with duplicate titles from the list (leaving the original).

我有一个对象列表，我有一个充满记录的数据库表。我的对象列表有一个 title 属性，我想从列表中删除任何具有重复标题的对象（保留原来的）。

Then I want to check if my list of objects has any duplicates of any records in the database and if so, remove those items from list before adding them to the database.

然后我想检查我的对象列表是否与数据库中的任何记录有任何重复，如果有，请在将它们添加到数据库之前从列表中删除这些项目。

I have seen solutions for removing duplicates from a list like this: myList = list(set(myList)), but i'm not sure how to do that with a list of objects?

我见过从这样的列表中删除重复项的解决方案：myList = list(set(myList))，但我不确定如何使用对象列表来做到这一点？

I need to maintain the order of my list of objects too. I was also thinking maybe I could use difflibto check for differences in the titles.

我也需要维护我的对象列表的顺序。我也在想也许我可以difflib用来检查标题的差异。

Answer 1

采纳答案by vonPetrushev

The set(list_of_objects)will only remove the duplicates if you know what a duplicate is, that is, you'll need to define a uniqueness of an object.

set(list_of_objects)如果您知道重复项是什么，则只会删除重复项，也就是说，您需要定义对象的唯一性。

In order to do that, you'll need to make the object hashable. You need to define both __hash__and __eq__method, here is how:

为此，您需要使对象可散列。您需要同时定义__hash__和__eq__方法，方法如下：

http://docs.python.org/glossary.html#term-hashable

Though, you'll probably only need to define __eq__method.

不过，您可能只需要定义__eq__方法。

EDIT: How to implement the __eq__method:

编辑：如何实现该__eq__方法：

You'll need to know, as I mentioned, the uniqueness definition of your object. Supposed we have a Book with attributes author_name and title that their combination is unique, (so, we can have many books Stephen King authored, and many books named The Shining, but only one book named The Shining by Stephen King), then the implementation is as follows:

正如我提到的，您需要知道对象的唯一性定义。假设我们有一本书的属性 author_name 和 title 它们的组合是唯一的，（所以，我们可以有很多史蒂芬金的书，很多书叫闪灵，但只有一本斯蒂芬金的书叫闪灵），那么实现如下：

def __eq__(self, other):
    return self.author_name==other.author_name\
           and self.title==other.title

Similarly, this is how I sometimes implement the __hash__method:

同样，这就是我有时实现该__hash__方法的方式：

def __hash__(self):
    return hash(('title', self.title,
                 'author_name', self.author_name))

You can check that if you create a list of 2 books with same author and title, the book objects will ~~be the same (with isoperator) and~~equal (with ==operator). Also, when set()is used, it will remove one book.

您可以检查，如果您创建了一个包含相同作者和标题的 2 本书的列表，则书籍对象将~~相同（使用is运算符）和~~相等（使用==运算符）。此外，当set()使用时，它将删除一本书。

EDIT: This is one old anwser of mine, but I only now notice that it has the error which is corrected with strikethrough in the last paragraph: objects with the same hash()won't give Truewhen compared with is. Hashability of object is used, however, if you intend to use them as elements of set, or as keys in dictionary.

编辑：这是我的一个老anwser，但我现在才注意到它有它在最后一段删除线更正错误：与相同的对象hash()不会放弃True的时候相比is。但是，如果您打算将它们用作集合的元素或字典中的键，则使用对象的散列性。

Answer 2

回答by aaronasterling

Since they're not hashable, you can't use a set directly. The titles should be though.

由于它们不可散列，因此您不能直接使用集合。标题应该是。

Here's the first part.

这是第一部分。

seen_titles = set()
new_list = []
for obj in myList:
    if obj.title not in seen_titles:
        new_list.append(obj)
        seen_titles.add(obj.title)

You're going to need to describe what database/ORM etc. you're using for the second part though.

您将需要描述您在第二部分使用的数据库/ORM 等。

Answer 3

回答by hughdbrown

This seems pretty minimal:

这似乎很小：

new_dict = dict()
for obj in myList:
    if obj.title not in new_dict:
        new_dict[obj.title] = obj

Answer 4

回答by Spiderman

Its quite easy freinds :-

它很容易朋友： -

a = [5,6,7,32,32,32,32,32,32,32,32]
a = list(set(a))
print (a)

a = [5,6,7,32,32,32,32,32,32,32,32]
a = 列表（集合（a））
打印（一）

[5,6,7,32]

thats it ! :)

就是这样！:)

Answer 5

回答by Amir

If you want to preserve the original order use it:

如果您想保留原始订单，请使用它：

seen = {}
new_list = [seen.setdefault(x, x) for x in my_list if x not in seen]

If you don't care of ordering then use it:

如果您不关心订购，请使用它：

new_list = list(set(my_list))

Answer 6

回答by qwr

Both __hash__and __eq__are needed for this.

双方__hash__并__eq__都需要这个。

__hash__is needed to add an object to a set, since python's sets are implemented as hashtables. By default, immutable objects like numbers, strings, and tuples are hashable.

__hash__需要将对象添加到集合中，因为python 的集合是作为 hashtables 实现的。默认情况下，像数字、字符串和元组这样的不可变对象是可散列的。

However, hash collisions (two distinct objects hashing to the same value) are inevitable, due to the pigeonhole principle. So, two objects cannot be distinguished only using their hash, and the user must specify their own __eq__function. Thus, the actual hash function the user provides is not crucial, though it is best to try to avoid hash collisions for performance (see What's a correct and good way to implement __hash__()?).

然而，由于鸽巢原理，散列冲突（两个不同的对象散列到相同的值）是不可避免的。因此，不能仅使用哈希来区分两个对象，用户必须指定自己的__eq__函数。因此，用户提供的实际散列函数并不重要，尽管为了性能最好尽量避免散列冲突（请参阅什么是实现 __hash__() 的正确和好方法？）。

Answer 7

回答by binW

I recently ended up using the code below. It is similar to other answers as it iterates over the list and records what it is seeing and then removes any item that it has already seen but it doesn't create a duplicate list, instead it just deletes the item from original list.

我最近最终使用了下面的代码。它类似于其他答案，因为它遍历列表并记录它所看到的内容，然后删除它已经看到的任何项目，但它不会创建重复的列表，而只是从原始列表中删除该项目。

seen = {}
for obj in objList:
    if obj["key-property"] in seen.keys():
        objList.remove(obj)
    else:
        seen[obj["key-property"]] = 1

使用 Python 删除对象列表中的重复项

提问by imns

采纳答案by vonPetrushev

回答by aaronasterling

回答by hughdbrown

回答by Spiderman

回答by Amir

回答by qwr

回答by binW

相关推荐

最近更新

标签

使用 Python 删除对象列表中的重复项

提问by imns

采纳答案by vonPetrushev

回答by aaronasterling

回答by hughdbrown

回答by Spiderman

回答by Amir

回答by qwr

回答by binW

相关推荐

Python 如何为 crontab 设置 virtualenv？

Python numpy 数组和矩阵之间有什么区别？我应该使用哪一种？

Python 高效的循环缓冲区？

Python PyQT4：将文件拖放到 QListWidget

相关推荐

最近更新

标签