Python 更好/更快地循环遍历集合或列表？

Question

提问by askewchan

If I have a python list that is has many duplicates, and I want to iterate through each item, but not through the duplicates, is it best to use a set (as in set(mylist), or find another way to create a list without duplicates? I was thinking of just looping through the list and checking for duplicates but I figured that's what set()does when it's initialized.

如果我有一个包含许多重复项的 python 列表，并且我想遍历每个项目，但不遍历重复项，最好使用一个集合（如set(mylist)，或找到另一种方法来创建一个没有重复项的列表？我正在考虑只是遍历列表并检查重复项，但我认为这就是set()它初始化时的作用。

So if mylist = [3,1,5,2,4,4,1,4,2,5,1,3]and I really just want to loop through [1,2,3,4,5](order doesn't matter), should I use set(mylist)or something else?

因此，如果mylist = [3,1,5,2,4,4,1,4,2,5,1,3]我真的只想循环遍历[1,2,3,4,5]（顺序无关紧要），我应该使用set(mylist)还是其他什么？

An alternative is possible in the last example, since the list contains every integer between its min and max value, I could loop through range(min(mylist),max(mylist))or through set(mylist). Should I generally try to avoid using set in this case? Also, would finding the minand maxbe slower than just creating the set?

在最后一个示例中可以使用另一种方法，因为列表包含其最小值和最大值之间的每个整数，我可以循环遍历range(min(mylist),max(mylist))或set(mylist)。在这种情况下，我通常应该尽量避免使用 set 吗？另外，找到min和max会比仅仅创建慢set吗？

In the case in the last example, the setis faster:

在最后一个例子中，set速度更快：

from numpy.random import random_integers
ids = random_integers(1e3,size=1e6)

def set_loop(mylist):
    idlist = []
    for id in set(mylist):
        idlist.append(id)
    return idlist

def list_loop(mylist):
    idlist = []
    for id in range(min(mylist),max(mylist)):
        idlist.append(id)
    return idlist

%timeit set_loop(ids)
#1 loops, best of 3: 232 ms per loop

%timeit list_loop(ids)
#1 loops, best of 3: 408 ms per loop

Answer 1

采纳答案by Eevee

Just use a set. Its semantics are exactly what you want: a collection of unique items.

只需使用一个set. 它的语义正是您想要的：独特项目的集合。

Technically you'll be iterating through the list twice: once to create the set, once for your actual loop. But you'd be doing just as much work or more with any other approach.

从技术上讲，您将遍历列表两次：一次创建集合，一次用于实际循环。但是你会用任何其他方法做同样多或更多的工作。

Answer 2

回答by GordonsBeard

For simplicity's sake: newList = list(set(oldList))

为简单起见： newList = list(set(oldList))

But there are better options out there if you'd like to get speed/ordering/optimization instead: http://www.peterbe.com/plog/uniqifiers-benchmark

但是，如果您想获得速度/排序/优化，还有更好的选择：http: //www.peterbe.com/plog/uniqifiers-benchmark

Answer 3

回答by John La Rooy

setis what you want, so you should use set. Trying to be clever introduces subtle bugs like forgetting to add one tomax(mylist)! Code defensively. Worry about what's faster when you determine that it is too slow.

set是你想要的，所以你应该使用set. 试图变得聪明会引入微妙的错误，例如忘记添加一个max(mylist)！防御性编码。当您确定它太慢时，请担心什么更快。

range(min(mylist), max(mylist) + 1)  # <-- don't forget to add 1

Answer 4

回答by hamx0r

While a setmay be what you want structure-wise, the question is what is faster. A list is faster. Your example code doesn't accurately compare setvs listbecause you're converting from a list to a set inset_loop, and then you're creating the listyou'll be looping through inlist_loop. The set and list you iterate through should be constructed and in memory ahead of time, and simply looped through to see which data structure is faster at iterating:

虽然 aset可能是您在结构上想要的，但问题是什么更快。列表更快。您的示例代码无法准确地与setvs进行比较，list因为您正在从列表转换为集合inset_loop，然后您正在创建list您将在list_loop. 您迭代的集合和列表应该提前构建并保存在内存中，然后简单地循环查看哪个数据结构的迭代速度更快：

ids_list = range(1000000)
ids_set = set(ids)
def f(x):
    for i in x:
         pass

%timeit f(ids_set)
#1 loops, best of 3: 214 ms per loop
%timeit f(ids_list)
#1 loops, best of 3: 176 ms per loop

Answer 5

回答by Charif DZ

I the list is vary large looping two time over it will take a lot of time and more in the second time you are looping a set not a list and as we know iterating over a set is slower than list.

我的列表变化很大，循环两次它会花费很多时间，并且在第二次循环一个集合而不是一个列表时会花费更多的时间，而且我们知道迭代一个集合比列表慢。

i think you need the power of generatorand set.

我认为你需要的功率generator和set。

def first_test():

    def loop_one_time(my_list):
        # create a set to keep the items.
        iterated_items = set()
        # as we know iterating over list is faster then list.
        for value in my_list: 
            # as we know checking if element exist in set is very fast not
            # metter the size of the set.
            if value not in iterated_items:  
                iterated_items.add(value) # add this item to list
                yield value


    mylist = [3,1,5,2,4,4,1,4,2,5,1,3]

    for v in loop_one_time(mylist):pass



def second_test():
    mylist = [3,1,5,2,4,4,1,4,2,5,1,3]
    s = set(mylist)
    for v in s:pass


import timeit

print(timeit.timeit('first_test()', setup='from __main__ import first_test', number=10000))
print(timeit.timeit('second_test()', setup='from __main__ import second_test', number=10000))

out put:

输出：

   0.024003583388435043
   0.010424674188938422

Note: this technique order is guaranteed

注意：此技术顺序是有保证的

Python 更好/更快地循环遍历集合或列表？

提问by askewchan

采纳答案by Eevee

回答by GordonsBeard

回答by John La Rooy

回答by hamx0r

回答by Charif DZ

相关推荐

最近更新

标签

Python 更好/更快地循环遍历集合或列表？

提问by askewchan

采纳答案by Eevee

回答by GordonsBeard

回答by John La Rooy

回答by hamx0r

回答by Charif DZ

相关推荐

使用python脚本进行谷歌搜索

Python 按下按钮时调用函数 - pyqt

Python 如何更改 matplotlib 图上的字体大小

如何在python中创建一个“空if语句”

相关推荐

最近更新

标签