Python 如果我们知道元素是唯一的,则可以快速扩展集合

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/16939402/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 00:02:35  来源:igfitidea点击:

Quick way to extend a set if we know elements are unique

pythonsetunion

提问by Stewart_R

I am performing multiple iterations of the type:

我正在执行以下类型的多次迭代:

masterSet=masterSet.union(setA)

As the set grows the length of time taken to perform these operations is growing (as one would expect, I guess).

随着集合的增长,执行这些操作所需的时间也在增长(我猜,正如人们所期望的那样)。

I expect that the time is taken up checking whether each element of setA is already in masterSet?

我希望花时间检查 setA 的每个元素是否已经在 masterSet 中?

My question is that if i KNOW that masterSet does not already contain any of elements in setA can I do this quicker?

我的问题是,如果我知道 masterSet 不包含 setA 中的任何元素,我可以更快地做到这一点吗?

[UPDATE]

[更新]

Given that this question is still attracting views I thought I would clear up a few of the things from the comments and answers below:

鉴于这个问题仍然吸引着观点,我想我会从下面的评论和答案中澄清一些事情:

When iterating though there were many iterations where I knewsetAwould be distinct from masterSetbecause of how it was constructed (without having to process any checks) but a few iterations I needed the uniqueness check.

在迭代时,虽然我知道有很多迭代setAmasterSet因它的构造方式而有所不同(无需处理任何检查),但有几次迭代我需要唯一性检查。

I wondered if there was a way to 'tell' the masterSet.union()procedure not to bother with the uniquness check this time around as I know this one is distinct from masterSetjust add these elements quickly trusting the programmer's assertion they were definately distict. Perhpas through calling some different ".unionWithDistinctSet()" procedure or something.

我想知道是否有一种方法可以“告诉”masterSet.union()程序这次不要打扰唯一性检查,因为我知道这与masterSet只是快速添加这些元素不同,相信程序员的断言他们肯定是不同的。也许通过调用一些不同的“ .unionWithDistinctSet()”程序或其他东西。

I think the responses have suggested that this isnt possible (and that really set operations should be quick enough anyway) but to use masterSet.update(setA)instead of union as its slightly quicker still.

我认为响应表明这是不可能的(无论如何,真正的设置操作应该足够快),而是使用masterSet.update(setA)而不是联合,因为它仍然稍微快一些。

I have accepted the clearest reponse along those lines, resolved the issue I was having at the time and got on with my life but would still love to hear if my hypothesised .unionWithDistinctSet()could ever exist?

我已经接受了这些方面最明确的回应,解决了我当时遇到的问题并继续我的生活,但仍然很想知道我的假设.unionWithDistinctSet()是否可能存在?

采纳答案by mgilson

You can use set.updateto update your master set in place. This saves allocating a new set all the time so it should be a little faster than set.union...

您可以使用set.update来更新您的母版集。这可以节省分配新集合的时间,因此它应该比set.union...快一点

>>> s = set(range(3))
>>> s.update(range(4))
>>> s
set([0, 1, 2, 3])


Of course, if you're doing this in a loop:

当然,如果您在循环中执行此操作:

masterSet = set()
for setA in iterable:
    masterSet = masterSet.union(setA)

You might get a performance boost by doing something like:

您可能会通过执行以下操作来提高性能:

masterSet = set().union(*iterable)


Ultimately, membership testing of a set is O(1) (in the average case), so testing if the element is already contained in the set isn't really a big performance hit.

最终,集合的成员资格测试是 O(1)(在平均情况下),因此测试元素是否已经包含在集合中并不是真正的大性能损失。

回答by Daniel Roseman

As mgilson points out, you can use updateto update a set in-place from another set. That actually works out slightly quicker:

正如 mgilson 指出的那样,您可以使用update从另一个集合就地更新一个集合。这实际上工作得稍微快一点:

def union():
    i = set(range(10000))
    j = set(range(5000, 15000))
    return i.union(j)

def update():
    i = set(range(10000))
    j = set(range(5000, 15000))
    i.update(j)
    return i

timeit.Timer(union).timeit(10000)   # 10.351907968521118
timeit.Timer(update).timeit(10000)  # 8.83384895324707

回答by njzk2

If you know your elements are unique, a set is not necessarily the best structure.

如果您知道您的元素是独一无二的,那么集合不一定是最好的结构。

A simple list is way faster to extend.

一个简单的列表可以更快地扩展。

masterList = list(masterSet)
masterList.extend(setA)

回答by Evgeni Sergeev

For sure, forgoing this check could be a big saving when the __eq__(..)method is very expensive. In the CPython implementation, __eq__(..)is called with every element already in the set that hashes to the same number. (Reference: source code for set.)

当然,当该__eq__(..)方法非常昂贵时,放弃此检查可能会节省很多。在 CPython 实现中,__eq__(..)使用散列为相同数字的集合中已有的每个元素调用。(参考:源代码set。)

However, there will never be this functionality in a million years, because it opens up another way to violate the integrity of a set. The trouble associated with that far outweighs the (typically negligible) performance gain. While if this is determined as a performance bottleneck, it's not hard to write a C++ extension, and use its STL <set>, which should be faster by one or more orders of magnitude.

然而,在一百万年内永远不会有这个功能,因为它开辟了另一种违反集合完整性的方法。与此相关的麻烦远远超过(通常可以忽略不计)性能增益。如果这被确定为性能瓶颈,那么编写 C++ 扩展并使用其 STL 并不难<set>,这应该快一个或多个数量级。