唯一值的python字典计数

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/16406329/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 22:31:56  来源:igfitidea点击:

python dictionary count of unique values

pythondictionary

提问by user1189851

I have a problem with counting distinct values for each key in Python.

我在计算 Python 中每个键的不同值时遇到问题。

I have a dictionary d like

我有一本喜欢的字典

[{"abc":"movies"}, {"abc": "sports"}, {"abc": "music"}, {"xyz": "music"}, {"pqr":"music"}, {"pqr":"movies"},{"pqr":"sports"}, {"pqr":"news"}, {"pqr":"sports"}]

I need to print number of distinct values per each key individually.

我需要单独打印每个键的不同值的数量。

That means I would want to print

这意味着我想打印

abc 3
xyz 1
pqr 4

Please help.

请帮忙。

Thank you

谢谢

采纳答案by Martijn Pieters

Over 6 years after answering, someone pointed out to me I misread the question. While my original answer (below) counts unique keysin the input sequence, you actually have a different count-distinct problem; you want to count values per key.

回答 6 年后,有人向我指出我误读了这个问题。虽然我的原始答案(如下)计算输入序列中的唯一,但实际上您有一个不同的计数差异问题;你想计算每个键的值

To count unique values per key, exactly, you'd have to collect those values into sets first:

要计算每个键的唯一值,确切地说,您必须首先将这些值收集到集合中:

values_per_key = {}
for d in iterable_of_dicts:
    for k, v in d.items():
        values_per_key.setdefault(k, set()).add(v)
counts = {k: len(v) for k, v in values_per_key.items()}

which for your input, produces:

对于您的输入,产生:

>>> values_per_key = {}
>>> for d in iterable_of_dicts:
...     for k, v in d.items():
...         values_per_key.setdefault(k, set()).add(v)
...
>>> counts = {k: len(v) for k, v in values_per_key.items()}
>>> counts
{'abc': 3, 'xyz': 1, 'pqr': 4}

We can still wrap that object in a Counter()instance if you want to make use of the additional functionality this class offers, see below:

Counter()如果您想使用此类提供的附加功能,我们仍然可以将该对象包装在一个实例中,请参见下文:

>>> from collections import Counter
>>> Counter(counts)
Counter({'pqr': 4, 'abc': 3, 'xyz': 1})

The downside is that if your input iterable is very large the above approach can require a lot of memory. In case you don't need exactcounts, e.g. when orders of magnitude suffice, there are other approaches, such as a hyperloglog structureor other algorithms that 'sketch out' a count for the stream.

缺点是如果您的输入迭代非常大,上述方法可能需要大量内存。如果您不需要精确的计数,例如当数量级足够时,还有其他方法,例如超级日志结构或其他“草绘”流计数的算法。

This approach requires you install a 3rd-party library. As an example, the datasketchprojectoffers both HyperLogLogand MinHash. Here's a HLL example (using the HyperLogLogPlusPlusclass, which is a recent improvement to the HLL approach):

这种方法需要您安装第 3 方库。例如,该datasketch项目同时提供HyperLogLogMinHash。这是一个 HLL 示例(使用HyperLogLogPlusPlus类,这是 HLL 方法的最新改进):

from collections import defaultdict
from datasketch import HyperLogLogPlusPlus

counts = defaultdict(HyperLogLogPlusPlus)

for d in iterable_of_dicts:
    for k, v in d.items():
        counts[k].update(v.encode('utf8'))

In a distributed setup, you could use Redis to manage the HLL counts.

在分布式设置中,您可以使用Redis 来管理 HLL 计数



My original answer:

我原来的回答

Use a collections.Counter()instance, together with some chaining:

使用collections.Counter()instance和一些链接:

from collections import Counter
from itertools import chain

counts = Counter(chain.from_iterable(e.keys() for e in d))

This ensures that dictionaries with more than one key in your input list are counted correctly.

这可确保正确计算输入列表中具有多个键的词典。

Demo:

演示:

>>> from collections import Counter
>>> from itertools import chain
>>> d = [{"abc":"movies"}, {"abc": "sports"}, {"abc": "music"}, {"xyz": "music"}, {"pqr":"music"}, {"pqr":"movies"},{"pqr":"sports"}, {"pqr":"news"}, {"pqr":"sports"}]
>>> Counter(chain.from_iterable(e.keys() for e in d))
Counter({'pqr': 5, 'abc': 3, 'xyz': 1})

or with multiple keys in the input dictionaries:

或在输入字典中使用多个键:

>>> d = [{"abc":"movies", 'xyz': 'music', 'pqr': 'music'}, {"abc": "sports", 'pqr': 'movies'}, {"abc": "music", 'pqr': 'sports'}, {"pqr":"news"}, {"pqr":"sports"}]
>>> Counter(chain.from_iterable(e.keys() for e in d))
Counter({'pqr': 5, 'abc': 3, 'xyz': 1})

A Counter()has additional, helpful functionality, such as the .most_common()methodthat lists elements and their counts in reverse sorted order:

ACounter()具有附加的有用功能,例如以反向排序顺序列出元素及其计数的.most_common()方法

for key, count in counts.most_common():
    print '{}: {}'.format(key, count)

# prints
# 5: pqr
# 3: abc
# 1: xyz

回答by Tim Pietzcker

>>> d = [{"abc":"movies"}, {"abc": "sports"}, {"abc": "music"}, {"xyz": "music"},
... {"pqr":"music"}, {"pqr":"movies"},{"pqr":"sports"}, {"pqr":"news"}, 
... {"pqr":"sports"}]
>>> from collections import Counter
>>> counts = Counter(key for dic in d for key in dic.keys())
>>> counts
Counter({'pqr': 5, 'abc': 3, 'xyz': 1})
>>> for key in counts:
...     print (key, counts[key])
...
xyz 1
abc 3
pqr 5

回答by neilr8133

What you're describing--a list with multiple values for each key--would be better visualized by something like this:

您所描述的内容——每个键都有多个值的列表——最好通过以下方式进行可视化:

{'abc': ['movies', 'sports', 'music'],
 'xyz': ['music'],
 'pqr': ['music', 'movies', 'sports', 'news']
}

In that case, you have to do a bit more work to insert:

在这种情况下,您必须做更多的工作才能插入:

  1. Lookup key to see if it already exists
    • If doesn't exist, create new key with value [](empty list)
  2. Retrieve value (the list associated with the key)
  3. Use if value into see if the value being checked exists in the list
  4. If the new value isn't in, .append()it
  1. 查找键以查看它是否已经存在
    • 如果不存在,则创建具有值的新键[](空列表)
  2. 检索值(与键关联的列表)
  3. 使用if value in,看是否被检查的值在列表中存在
  4. 如果新值不在,.append()

This also leads to an easy way to count the total number of elements stored:

这也导致了一种计算存储元素总数的简单方法:

# Pseudo-code
for myKey in myDict.keys():
    print "{0}: {1}".format(myKey, len(myDict[myKey])

回答by Travis Griggs

Use a collections.Counter. Assuming that you have a list of one item dictionaries...

使用 collections.Counter。假设你有一个单项字典的列表......

from collections import Counter
listOfDictionaries = [{'abc':'movies'}, {'abc':'sports'}, {'abc':'music'},
    {'xyz':'music'}, {'pqr':'music'}, {'pqr':'movies'},
    {'pqr':'sports'}, {'pqr':'news'}, {'pqr':'sports'}]
Counter(list(dict)[0] for dict in zzz)

回答by akashdeep

No need of using counter. You can achieve in this way:

无需使用计数器。您可以通过这种方式实现:

# input dictionary
d=[{"abc":"movies"}, {"abc": "sports"}, {"abc": "music"}, {"xyz": "music"}, {"pqr":"music"}, {"pqr":"movies"},{"pqr":"sports"}, {"pqr":"news"}, {"pqr":"sports"}]

# fetch keys
b=[j[0] for i in d for j in i.items()]

# print output
for k in list(set(b)):
    print "{0}: {1}".format(k, b.count(k))

回答by marco

Building on @akashdeep solution which uses the set but gives a wrong result because is not counting for the "distinct" requirement mentioned in the question (pqrshould be 4, not 5).

建立在@akashdeep 解决方案的基础上,该解决方案使用该集合但给出了错误的结果,因为不考虑问题中提到的“不同”要求(pqr应该是 4,而不是 5)。

# dictionary
d=[{"abc":"movies"}, {"abc": "sports"}, {"abc": "music"}, {"xyz": "music"}, {"pqr":"music"}, {"pqr":"movies"},{"pqr":"sports"}, {"pqr":"news"}, {"pqr":"sports"}]

# merged dictionary
c = {}
for i in d:
    for k,v in i.items():
        try:
            c[k].append(v)
        except KeyError:
            c[k] = [v]

# counting and printing
for k,v in c.items():
    print "{0}: {1}".format(k, len(set(v)))

This will give the correct:

这将给出正确的:

xyz: 1
abc: 3
pqr: 4