Python 集合中的元素数

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2528513/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-11-04 00:51:12  来源:igfitidea点击:

Number of elements in Python Set

pythoncomparisondataset

提问by Tim

I have a list of phone numbers that have been dialed (nums_dialed). I also have a set of phone numbers which are the number in a client's office (client_nums) How do I efficiently figure out how many times I've called a particular client (total)

我有一个已拨电话号码列表 (nums_dialed)。我还有一组电话号码,它们是客户办公室的号码 (client_nums) 我如何有效地计算出我给特定客户打电话的次数(总计)

For example:

例如:

>>>nums_dialed=[1,2,2,3,3]
>>>client_nums=set([2,3])
>>>???
total=4

Problem is that I have a large-ish dataset: len(client_nums) ~ 10^5; and len(nums_dialed) ~10^3.

问题是我有一个大型数据集:len(client_nums) ~ 10^5; 和 len(nums_dialed) ~10^3。

回答by nosklo

which client has 10^5numbers in his office? Do you do work for an entire telephone company?

哪个客户10^5的办公室里有电话号码?你为整个电话公司工作吗?

Anyway:

反正:

print sum(1 for num in nums_dialed if num in client_nums)

That will give you as fast as possible the number.

这将尽快为您提供数字。



If you want to do it for multiple clients, using the same nums_dialedlist, then you could cache the data on each number first:

如果您想使用相同的nums_dialed列表为多个客户端执行此操作,则可以先缓存每个数字上的数据:

nums_dialed_dict = collections.defaultdict(int)
for num in nums_dialed:
    nums_dialed_dict[num] += 1

Then just sum the ones on each client:

然后只需将每个客户端上的相加:

sum(nums_dialed_dict[num] for num in this_client_nums)

That would be a lot quicker than iterating over the entire list of numbers again for each client.

这比为每个客户端再次迭代整个数字列表要快得多。

回答by Eli Bendersky

>>> client_nums = set([2, 3])
>>> nums_dialed = [1, 2, 2, 3, 3]
>>> count = 0
>>> for num in nums_dialed:
...   if num in client_nums:
...     count += 1
... 
>>> count
4
>>> 

Should be quite efficient even for the large numbers you quote.

即使对于您引用的大量数字,也应该非常有效。

回答by Eli Bendersky

Using collections.Counter from Python 2.7:

使用 Python 2.7 中的 collections.Counter:

dialed_count = collections.Counter(nums_dialed)
count = sum(dialed_count[t] for t in client_nums)

回答by ony

Thats very popular way to do some combination of sorted lists in single pass:

这是在单次传递中组合排序列表的一种非常流行的方法:

nums_dialed = [1, 2, 2, 3, 3]
client_nums = [2,3]

nums_dialed.sort()
client_nums.sort()

c = 0
i = iter(nums_dialed)
j = iter(client_nums)
try:
    a = i.next()
    b = j.next()
    while True:
        if a < b:
            a = i.next()
            continue
        if a > b:
            b = j.next()
            continue
        # a == b
        c += 1
        a = i.next() # next dialed
except StopIteration:
    pass

print c

Because "set" is unordered collection (don't know why it uses hashes, but not binary tree or sorted list) and it's not fair to use it there. You can implement own "set" through "bisect" if you like lists or through something more complicated that will produce ordered iterator.

因为“set”是无序集合(不知道为什么它使用散列,而不是二叉树或排序列表)并且在那里使用它是不公平的。如果您喜欢列表或通过将生成有序迭代器的更复杂的东西,您可以通过“bisect”实现自己的“set”。