删除列表中的重复项,同时保持其顺序(Python)
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1549509/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Remove duplicates in a list while keeping its order (Python)
提问by TIMEX
This is actually an extension of this question. The answers of that question did not keep the "order" of the list after removing duplicates. How to remove these duplicates in a list (python)
这实际上是这个问题的延伸。该问题的答案在删除重复项后并没有保持列表的“顺序”。如何删除列表中的这些重复项 (python)
biglist =
[
{'title':'U2 Band','link':'u2.com'},
{'title':'Live Concert by U2','link':'u2.com'},
{'title':'ABC Station','link':'abc.com'}
]
In this case, the 2nd element should be removed because a previous "u2.com" element already exists. However, the order should be kept.
在这种情况下,应该删除第二个元素,因为之前的“u2.com”元素已经存在。但是,应该保持顺序。
回答by egafni
use set(), then re-sort using the index of the original list.
使用 set(),然后使用原始列表的索引重新排序。
>>> mylist = ['c','a','a','b','a','b','c']
>>> sorted(set(mylist), key=lambda x: mylist.index(x))
['c', 'a', 'b']
回答by Alex Martelli
My answer to your other question, which you completely ignored!, shows you're wrong in claiming that
我对你的另一个问题的回答,你完全忽略了!,表明你声称这是错误的
The answers of that question did not keep the "order"
那个问题的答案没有保持“秩序”
- my answer didkeep order, and it clearly saidit did. Here it is again, with added emphasis to see if you can just keep ignoring it...:
- 我的回答确实保持秩序,而且它清楚地表明确实如此。又来了,强调一下,看看你是否可以继续忽略它......:
Probably the fastest approach, for a really big list, if you want to preserve the exact order of the items that remain, is the following...:
对于一个非常大的列表,如果您想保留剩余项目的确切顺序,可能最快的方法是以下...:
biglist = [
{'title':'U2 Band','link':'u2.com'},
{'title':'ABC Station','link':'abc.com'},
{'title':'Live Concert by U2','link':'u2.com'}
]
known_links = set()
newlist = []
for d in biglist:
link = d['link']
if link in known_links: continue
newlist.append(d)
known_links.add(link)
biglist[:] = newlist
回答by Jochen Ritzel
Generators are great.
发电机很棒。
def unique( seq ):
seen = set()
for item in seq:
if item not in seen:
seen.add( item )
yield item
biglist[:] = unique( biglist )
回答by Tarnay Kálmán
This page discusses different methods and their speeds: http://www.peterbe.com/plog/uniqifiers-benchmark
此页面讨论了不同的方法及其速度:http: //www.peterbe.com/plog/uniqifiers-benchmark
The recommended* method:
推荐的*方法:
def f5(seq, idfun=None):
# order preserving
if idfun is None:
def idfun(x): return x
seen = {}
result = []
for item in seq:
marker = idfun(item)
# in old Python versions:
# if seen.has_key(marker)
# but in new ones:
if marker in seen: continue
seen[marker] = 1
result.append(item)
return result
f5(biglist,lambda x: x['link'])
*by that page
*通过那个页面
回答by rools
This is an elegant and compact way, with list comprehension (but not as efficient as with dictionary):
这是一种优雅而紧凑的方式,具有列表理解(但不如字典有效):
mylist = ['aaa','aba','aaa','aea','baa','aaa','aac','aaa',]
[ v for (i,v) in enumerate(mylist) if v not in mylist[0:i] ]
And in the context of the answer:
在答案的上下文中:
[ v for (i,v) in enumerate(biglist) if v['link'] not in map(lambda d: d['link'], biglist[0:i]) ]
回答by falco
Try this :
试试这个 :
list = ['aaa','aba','aaa','aea','baa','aaa','aac','aaa',]
uniq = []
for i in list:
if i not in uniq:
uniq.append(i)
print list
print uniq
output will be :
输出将是:
['aaa', 'aba', 'aaa', 'aea', 'baa', 'aaa', 'aac', 'aaa']
['aaa', 'aba', 'aea', 'baa', 'aac']
回答by Peter
dups = {}
newlist = []
for x in biglist:
if x['link'] not in dups:
newlist.append(x)
dups[x['link']] = None
print newlist
produces
产生
[{'link': 'u2.com', 'title': 'U2 Band'}, {'link': 'abc.com', 'title': 'ABC Station'}]
Note that here I used a dictionary. This makes the test not in dups
much more efficient than using a list.
请注意,这里我使用了字典。这使得测试not in dups
比使用列表更有效。
回答by ABentSpoon
I think using a set should be pretty efficent.
我认为使用 set 应该非常有效。
seen_links = set()
for index in len(biglist):
link = biglist[index]['link']
if link in seen_links:
del(biglist[index])
seen_links.add(link)
I think this should come in at O(nlog(n))
我认为这应该在 O(nlog(n))
回答by Greg Hewgill
A super easy way to do this is:
一个超级简单的方法是:
def uniq(a):
if len(a) == 0:
return []
else:
return [a[0]] + uniq([x for x in a if x != a[0]])
This is not the most efficient way, because:
这不是最有效的方法,因为:
- it searches through the whole list for every element in the list, so it's O(n^2)
- it's recursive so uses a stack depth equal to the length of the list
- 它在整个列表中搜索列表中的每个元素,所以它是 O(n^2)
- 它是递归的,因此使用等于列表长度的堆栈深度
However, for simple uses (no more than a few hundred items, not performance critical) it is sufficient.
但是,对于简单的用途(不超过几百个项目,对性能不重要)就足够了。