python 将单词列表转换为频率字典的最佳方法

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/722697/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-11-03 20:44:05  来源:igfitidea点击:

Best way to turn word list into frequency dict

python

提问by ???u

What's the best way to convert a list/tuple into a dict where the keys are the distinct values of the list and the values are the the frequencies of those distinct values?

将列表/元组转换为 dict 的最佳方法是什么,其中键是列表的不同值,而值是这些不同值的频率?

In other words:

换句话说:

['a', 'b', 'b', 'a', 'b', 'c']
--> 
{'a': 2, 'b': 3, 'c': 1}

(I've had to do something like the above so many times, is there anything in the standard lib that does it for you?)

(我不得不多次做类似上面的事情,标准库中有什么可以为你做的吗?)

EDIT:

编辑:

Jacob Gabrielson points out there is something coming in the standard libfor the 2.7/3.1 branch

Jacob Gabrielson 指出2.7/3.1 分支的标准库中一些东西

回答by SilentGhost

I find that the easiest to understand (while might not be the most efficient) way is to do:

我发现最容易理解(虽然可能不是最有效)的方法是:

{i:words.count(i) for i in set(words)}

回答by S.Lott

Kind of

的种类

from collections import defaultdict
fq= defaultdict( int )
for w in words:
    fq[w] += 1

That usually works nicely.

这通常很好用。

回答by Jacob Gabrielson

Just a note that, starting with Python 2.7/3.1, this functionality will be built in to the collectionsmodule, see this bugfor more information. Here's the example from the release notes:

请注意,从 Python 2.7/3.1 开始,此功能将内置到collections模块中,有关更多信息,请参阅此错误。这是发行说明中的示例:

>>> from collections import Counter
>>> c=Counter()
>>> for letter in 'here is a sample of english text':
...   c[letter] += 1
...
>>> c
Counter({' ': 6, 'e': 5, 's': 3, 'a': 2, 'i': 2, 'h': 2,
'l': 2, 't': 2, 'g': 1, 'f': 1, 'm': 1, 'o': 1, 'n': 1,
'p': 1, 'r': 1, 'x': 1})
>>> c['e']
5
>>> c['z']
0

回答by YardenR

Actually, the answer of Counter was already mentioned, but we can even do better (easier)!

其实Counter的回答已经提过了,但是我们还可以做得更好(更简单)!

from collections import Counter
my_list = ['a', 'b', 'b', 'a', 'b', 'c']
Counter(my_list)  # returns a Counter, dict-like object
>> Counter({'b': 3, 'a': 2, 'c': 1})

回答by Steven Huwig

This is an abomination, but:

这是可憎的,但是:

from itertools import groupby
dict((k, len(list(xs))) for k, xs in groupby(sorted(items)))

I can't think of a reason one would choose this method over S.Lott's, but if someone's going to point it out, it might as well be me. :)

我想不出有什么理由会选择这种方法而不是 S.Lott 的,但如果有人要指出它,那也可能是我。:)

回答by ???u

I have to share an interesting but kind of ridiculous way of doing it that I just came up with:

我必须分享一种我刚刚想出的有趣但有点荒谬的方法:

>>> class myfreq(dict):
...     def __init__(self, arr):
...         for k in arr:
...             self[k] = 1
...     def __setitem__(self, k, v):
...         dict.__setitem__(self, k, self.get(k, 0) + v)
... 
>>> myfreq(['a', 'b', 'b', 'a', 'b', 'c'])
{'a': 2, 'c': 1, 'b': 3}

回答by user8338

I decided to go ahead and test the versions suggested, I found the collections.Counteras suggested by Jacob Gabrielson to be the fastest, followed by the defaultdictversion by SLott.

我决定继续测试建议的版本,我发现collections.CounterJacob Gabrielson 建议的defaultdict版本是最快的,其次是 SLott的版本。

Here are my codes :

这是我的代码:

from collections import defaultdict
from collections import Counter

import random

# using default dict
def counter_default_dict(list):
    count=defaultdict(int)
    for i in list:
        count[i]+=1
    return count

# using normal dict
def counter_dict(list):
    count={}
    for i in list:
        count.update({i:count.get(i,0)+1})
    return count

# using count and dict
def counter_count(list):
    count={i:list.count(i) for i in set(list)}
    return count

# using count and dict
def counter_counter(list):
    count = Counter(list)
    return count

list=sorted([random.randint(0,250) for i in range(300)])


if __name__=='__main__':
    from timeit import timeit
    print("collections.Defaultdict ",timeit("counter_default_dict(list)", setup="from __main__ import counter_default_dict,list", number=1000))
    print("Dict",timeit("counter_dict(list)",setup="from __main__ import counter_dict,list",number=1000))
    print("list.count ",timeit("counter_count(list)", setup="from __main__ import counter_count,list", number=1000))
    print("collections.Counter.count ",timeit("counter_counter(list)", setup="from __main__ import counter_counter,list", number=1000))

And my results:

我的结果:

collections.Defaultdict 
0.06787874956330614
Dict
 0.15979115872995675
list.count 
 1.199258431219126
collections.Counter.count
 0.025896202538920665

Do let me know how I can improve the analysis.

请告诉我如何改进分析。

回答by imankalyan

I think using collection library is the easiest way to get it. But If you want to get the frequency dictionary without using it then it's another way,

我认为使用收藏库是获得它的最简单方法。但是如果你想在不使用它的情况下获得频率字典,那么它是另一种方式,

l = [1,4,2,1,2,6,8,2,2]
d ={}
for i in l:
    if i in d.keys():
        d[i] = 1 + d[i]
    else:
        d[i] = 1
print (d)

op:

操作:

{1: 2, 4: 1, 2: 4, 6: 1, 8: 1}