删除列表中的重复项，同时保持其顺序（Python）

Question

提问by TIMEX

This is actually an extension of this question. The answers of that question did not keep the "order" of the list after removing duplicates. How to remove these duplicates in a list (python)

这实际上是这个问题的延伸。该问题的答案在删除重复项后并没有保持列表的“顺序”。如何删除列表中的这些重复项 (python)

biglist = 

[ 

    {'title':'U2 Band','link':'u2.com'}, 
    {'title':'Live Concert by U2','link':'u2.com'},
    {'title':'ABC Station','link':'abc.com'}

]

In this case, the 2nd element should be removed because a previous "u2.com" element already exists. However, the order should be kept.

在这种情况下，应该删除第二个元素，因为之前的“u2.com”元素已经存在。但是，应该保持顺序。

Answer 1

回答by egafni

use set(), then re-sort using the index of the original list.

使用 set()，然后使用原始列表的索引重新排序。

>>> mylist = ['c','a','a','b','a','b','c']
>>> sorted(set(mylist), key=lambda x: mylist.index(x))
['c', 'a', 'b']

Answer 2

回答by Alex Martelli

My answer to your other question, which you completely ignored!, shows you're wrong in claiming that

我对你的另一个问题的回答，你完全忽略了！，表明你声称这是错误的

The answers of that question did not keep the "order"

那个问题的答案没有保持“秩序”

my answer didkeep order, and it clearly saidit did. Here it is again, with added emphasis to see if you can just keep ignoring it...:

我的回答确实保持秩序，而且它清楚地表明确实如此。又来了，强调一下，看看你是否可以继续忽略它......：

Probably the fastest approach, for a really big list, if you want to preserve the exact order of the items that remain, is the following...:

对于一个非常大的列表，如果您想保留剩余项目的确切顺序，可能最快的方法是以下...：

biglist = [ 
    {'title':'U2 Band','link':'u2.com'}, 
    {'title':'ABC Station','link':'abc.com'}, 
    {'title':'Live Concert by U2','link':'u2.com'} 
]

known_links = set()
newlist = []

for d in biglist:
  link = d['link']
  if link in known_links: continue
  newlist.append(d)
  known_links.add(link)

biglist[:] = newlist

Answer 3

回答by Jochen Ritzel

Generators are great.

发电机很棒。

def unique( seq ):
    seen = set()
    for item in seq:
        if item not in seen:
            seen.add( item )
            yield item

biglist[:] = unique( biglist )

Answer 4

回答by Tarnay Kálmán

This page discusses different methods and their speeds: http://www.peterbe.com/plog/uniqifiers-benchmark

此页面讨论了不同的方法及其速度：http: //www.peterbe.com/plog/uniqifiers-benchmark

The recommended* method:

推荐的*方法：

def f5(seq, idfun=None):  
    # order preserving 
    if idfun is None: 
        def idfun(x): return x 
    seen = {} 
    result = [] 
    for item in seq: 
        marker = idfun(item) 
        # in old Python versions: 
        # if seen.has_key(marker) 
        # but in new ones: 
        if marker in seen: continue 
        seen[marker] = 1 
        result.append(item) 
    return result

f5(biglist,lambda x: x['link'])

*by that page

*通过那个页面

Answer 5

回答by rools

This is an elegant and compact way, with list comprehension (but not as efficient as with dictionary):

这是一种优雅而紧凑的方式，具有列表理解（但不如字典有效）：

mylist = ['aaa','aba','aaa','aea','baa','aaa','aac','aaa',]

[ v for (i,v) in enumerate(mylist) if v not in mylist[0:i] ]

And in the context of the answer:

在答案的上下文中：

[ v for (i,v) in enumerate(biglist) if v['link'] not in map(lambda d: d['link'], biglist[0:i]) ]

Answer 6

回答by falco

Try this :

试试这个：

list = ['aaa','aba','aaa','aea','baa','aaa','aac','aaa',]
uniq = []
for i in list:
               if i not in uniq:
                   uniq.append(i)

print list
print uniq

output will be :

输出将是：

['aaa', 'aba', 'aaa', 'aea', 'baa', 'aaa', 'aac', 'aaa']
['aaa', 'aba', 'aea', 'baa', 'aac']

Answer 7

回答by Peter

dups = {}
newlist = []
for x in biglist:
    if x['link'] not in dups:
      newlist.append(x)
      dups[x['link']] = None

print newlist

produces

产生

[{'link': 'u2.com', 'title': 'U2 Band'}, {'link': 'abc.com', 'title': 'ABC Station'}]

Note that here I used a dictionary. This makes the test not in dupsmuch more efficient than using a list.

请注意，这里我使用了字典。这使得测试not in dups比使用列表更有效。

Answer 8

回答by ABentSpoon

I think using a set should be pretty efficent.

我认为使用 set 应该非常有效。

seen_links = set()
for index in len(biglist):
    link = biglist[index]['link']
    if link in seen_links:
        del(biglist[index])
    seen_links.add(link)

I think this should come in at O(nlog(n))

我认为这应该在 O(nlog(n))

Answer 9

回答by Greg Hewgill

A super easy way to do this is:

一个超级简单的方法是：

def uniq(a):
    if len(a) == 0:
        return []
    else:
        return [a[0]] + uniq([x for x in a if x != a[0]])

This is not the most efficient way, because:

这不是最有效的方法，因为：

it searches through the whole list for every element in the list, so it's O(n^2)
it's recursive so uses a stack depth equal to the length of the list

它在整个列表中搜索列表中的每个元素，所以它是 O(n^2)
它是递归的，因此使用等于列表长度的堆栈深度

However, for simple uses (no more than a few hundred items, not performance critical) it is sufficient.

但是，对于简单的用途（不超过几百个项目，对性能不重要）就足够了。

删除列表中的重复项，同时保持其顺序（Python）

提问by TIMEX

回答by egafni

回答by Alex Martelli

回答by Jochen Ritzel

回答by Tarnay Kálmán

回答by rools

回答by falco

回答by Peter

回答by ABentSpoon

回答by Greg Hewgill

相关推荐

最近更新

标签

删除列表中的重复项，同时保持其顺序（Python）

提问by TIMEX

回答by egafni

回答by Alex Martelli

回答by Jochen Ritzel

回答by Tarnay Kálmán

回答by rools

回答by falco

回答by Peter

回答by ABentSpoon

回答by Greg Hewgill

相关推荐

python 具有多个属性和混合顺序的列表排序

python 是否可以将值列表保存到 SQLite 列中？

python 日志文件的合适大小（以字节为单位）是多少？

python 获取链接的根域

相关推荐

最近更新

标签