唯一值的python字典计数
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/16406329/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
python dictionary count of unique values
提问by user1189851
I have a problem with counting distinct values for each key in Python.
我在计算 Python 中每个键的不同值时遇到问题。
I have a dictionary d like
我有一本喜欢的字典
[{"abc":"movies"}, {"abc": "sports"}, {"abc": "music"}, {"xyz": "music"}, {"pqr":"music"}, {"pqr":"movies"},{"pqr":"sports"}, {"pqr":"news"}, {"pqr":"sports"}]
I need to print number of distinct values per each key individually.
我需要单独打印每个键的不同值的数量。
That means I would want to print
这意味着我想打印
abc 3
xyz 1
pqr 4
Please help.
请帮忙。
Thank you
谢谢
采纳答案by Martijn Pieters
Over 6 years after answering, someone pointed out to me I misread the question. While my original answer (below) counts unique keysin the input sequence, you actually have a different count-distinct problem; you want to count values per key.
回答 6 年后,有人向我指出我误读了这个问题。虽然我的原始答案(如下)计算输入序列中的唯一键,但实际上您有一个不同的计数差异问题;你想计算每个键的值。
To count unique values per key, exactly, you'd have to collect those values into sets first:
要计算每个键的唯一值,确切地说,您必须首先将这些值收集到集合中:
values_per_key = {}
for d in iterable_of_dicts:
for k, v in d.items():
values_per_key.setdefault(k, set()).add(v)
counts = {k: len(v) for k, v in values_per_key.items()}
which for your input, produces:
对于您的输入,产生:
>>> values_per_key = {}
>>> for d in iterable_of_dicts:
... for k, v in d.items():
... values_per_key.setdefault(k, set()).add(v)
...
>>> counts = {k: len(v) for k, v in values_per_key.items()}
>>> counts
{'abc': 3, 'xyz': 1, 'pqr': 4}
We can still wrap that object in a Counter()instance if you want to make use of the additional functionality this class offers, see below:
Counter()如果您想使用此类提供的附加功能,我们仍然可以将该对象包装在一个实例中,请参见下文:
>>> from collections import Counter
>>> Counter(counts)
Counter({'pqr': 4, 'abc': 3, 'xyz': 1})
The downside is that if your input iterable is very large the above approach can require a lot of memory. In case you don't need exactcounts, e.g. when orders of magnitude suffice, there are other approaches, such as a hyperloglog structureor other algorithms that 'sketch out' a count for the stream.
缺点是如果您的输入迭代非常大,上述方法可能需要大量内存。如果您不需要精确的计数,例如当数量级足够时,还有其他方法,例如超级日志结构或其他“草绘”流计数的算法。
This approach requires you install a 3rd-party library. As an example, the datasketchprojectoffers both HyperLogLogand MinHash. Here's a HLL example (using the HyperLogLogPlusPlusclass, which is a recent improvement to the HLL approach):
这种方法需要您安装第 3 方库。例如,该datasketch项目同时提供HyperLogLog和MinHash。这是一个 HLL 示例(使用HyperLogLogPlusPlus类,这是 HLL 方法的最新改进):
from collections import defaultdict
from datasketch import HyperLogLogPlusPlus
counts = defaultdict(HyperLogLogPlusPlus)
for d in iterable_of_dicts:
for k, v in d.items():
counts[k].update(v.encode('utf8'))
In a distributed setup, you could use Redis to manage the HLL counts.
在分布式设置中,您可以使用Redis 来管理 HLL 计数。
My original answer:
我原来的回答:
Use a collections.Counter()instance, together with some chaining:
使用collections.Counter()instance和一些链接:
from collections import Counter
from itertools import chain
counts = Counter(chain.from_iterable(e.keys() for e in d))
This ensures that dictionaries with more than one key in your input list are counted correctly.
这可确保正确计算输入列表中具有多个键的词典。
Demo:
演示:
>>> from collections import Counter
>>> from itertools import chain
>>> d = [{"abc":"movies"}, {"abc": "sports"}, {"abc": "music"}, {"xyz": "music"}, {"pqr":"music"}, {"pqr":"movies"},{"pqr":"sports"}, {"pqr":"news"}, {"pqr":"sports"}]
>>> Counter(chain.from_iterable(e.keys() for e in d))
Counter({'pqr': 5, 'abc': 3, 'xyz': 1})
or with multiple keys in the input dictionaries:
或在输入字典中使用多个键:
>>> d = [{"abc":"movies", 'xyz': 'music', 'pqr': 'music'}, {"abc": "sports", 'pqr': 'movies'}, {"abc": "music", 'pqr': 'sports'}, {"pqr":"news"}, {"pqr":"sports"}]
>>> Counter(chain.from_iterable(e.keys() for e in d))
Counter({'pqr': 5, 'abc': 3, 'xyz': 1})
A Counter()has additional, helpful functionality, such as the .most_common()methodthat lists elements and their counts in reverse sorted order:
ACounter()具有附加的有用功能,例如以反向排序顺序列出元素及其计数的.most_common()方法:
for key, count in counts.most_common():
print '{}: {}'.format(key, count)
# prints
# 5: pqr
# 3: abc
# 1: xyz
回答by Tim Pietzcker
>>> d = [{"abc":"movies"}, {"abc": "sports"}, {"abc": "music"}, {"xyz": "music"},
... {"pqr":"music"}, {"pqr":"movies"},{"pqr":"sports"}, {"pqr":"news"},
... {"pqr":"sports"}]
>>> from collections import Counter
>>> counts = Counter(key for dic in d for key in dic.keys())
>>> counts
Counter({'pqr': 5, 'abc': 3, 'xyz': 1})
>>> for key in counts:
... print (key, counts[key])
...
xyz 1
abc 3
pqr 5
回答by neilr8133
What you're describing--a list with multiple values for each key--would be better visualized by something like this:
您所描述的内容——每个键都有多个值的列表——最好通过以下方式进行可视化:
{'abc': ['movies', 'sports', 'music'],
'xyz': ['music'],
'pqr': ['music', 'movies', 'sports', 'news']
}
In that case, you have to do a bit more work to insert:
在这种情况下,您必须做更多的工作才能插入:
- Lookup key to see if it already exists
- If doesn't exist, create new key with value
[](empty list)
- If doesn't exist, create new key with value
- Retrieve value (the list associated with the key)
- Use
if value into see if the value being checked exists in the list - If the new value isn't in,
.append()it
- 查找键以查看它是否已经存在
- 如果不存在,则创建具有值的新键
[](空列表)
- 如果不存在,则创建具有值的新键
- 检索值(与键关联的列表)
- 使用
if value in,看是否被检查的值在列表中存在 - 如果新值不在,
.append()它
This also leads to an easy way to count the total number of elements stored:
这也导致了一种计算存储元素总数的简单方法:
# Pseudo-code
for myKey in myDict.keys():
print "{0}: {1}".format(myKey, len(myDict[myKey])
回答by Travis Griggs
Use a collections.Counter. Assuming that you have a list of one item dictionaries...
使用 collections.Counter。假设你有一个单项字典的列表......
from collections import Counter
listOfDictionaries = [{'abc':'movies'}, {'abc':'sports'}, {'abc':'music'},
{'xyz':'music'}, {'pqr':'music'}, {'pqr':'movies'},
{'pqr':'sports'}, {'pqr':'news'}, {'pqr':'sports'}]
Counter(list(dict)[0] for dict in zzz)
回答by akashdeep
No need of using counter. You can achieve in this way:
无需使用计数器。您可以通过这种方式实现:
# input dictionary
d=[{"abc":"movies"}, {"abc": "sports"}, {"abc": "music"}, {"xyz": "music"}, {"pqr":"music"}, {"pqr":"movies"},{"pqr":"sports"}, {"pqr":"news"}, {"pqr":"sports"}]
# fetch keys
b=[j[0] for i in d for j in i.items()]
# print output
for k in list(set(b)):
print "{0}: {1}".format(k, b.count(k))
回答by marco
Building on @akashdeep solution which uses the set but gives a wrong result because is not counting for the "distinct" requirement mentioned in the question (pqrshould be 4, not 5).
建立在@akashdeep 解决方案的基础上,该解决方案使用该集合但给出了错误的结果,因为不考虑问题中提到的“不同”要求(pqr应该是 4,而不是 5)。
# dictionary
d=[{"abc":"movies"}, {"abc": "sports"}, {"abc": "music"}, {"xyz": "music"}, {"pqr":"music"}, {"pqr":"movies"},{"pqr":"sports"}, {"pqr":"news"}, {"pqr":"sports"}]
# merged dictionary
c = {}
for i in d:
for k,v in i.items():
try:
c[k].append(v)
except KeyError:
c[k] = [v]
# counting and printing
for k,v in c.items():
print "{0}: {1}".format(k, len(set(v)))
This will give the correct:
这将给出正确的:
xyz: 1
abc: 3
pqr: 4

