python 将单词列表转换为频率字典的最佳方法

Question

提问by ???u

What's the best way to convert a list/tuple into a dict where the keys are the distinct values of the list and the values are the the frequencies of those distinct values?

将列表/元组转换为 dict 的最佳方法是什么，其中键是列表的不同值，而值是这些不同值的频率？

In other words:

换句话说：

['a', 'b', 'b', 'a', 'b', 'c']
--> 
{'a': 2, 'b': 3, 'c': 1}

(I've had to do something like the above so many times, is there anything in the standard lib that does it for you?)

（我不得不多次做类似上面的事情，标准库中有什么可以为你做的吗？）

EDIT:

编辑：

Jacob Gabrielson points out there is something coming in the standard libfor the 2.7/3.1 branch

Jacob Gabrielson 指出2.7/3.1 分支的标准库中有一些东西

Answer 1

回答by SilentGhost

I find that the easiest to understand (while might not be the most efficient) way is to do:

我发现最容易理解（虽然可能不是最有效）的方法是：

{i:words.count(i) for i in set(words)}

Answer 2

回答by S.Lott

Kind of

的种类

from collections import defaultdict
fq= defaultdict( int )
for w in words:
    fq[w] += 1

That usually works nicely.

这通常很好用。

Answer 3

回答by Jacob Gabrielson

Just a note that, starting with Python 2.7/3.1, this functionality will be built in to the collectionsmodule, see this bugfor more information. Here's the example from the release notes:

请注意，从 Python 2.7/3.1 开始，此功能将内置到collections模块中，有关更多信息，请参阅此错误。这是发行说明中的示例：

>>> from collections import Counter
>>> c=Counter()
>>> for letter in 'here is a sample of english text':
...   c[letter] += 1
...
>>> c
Counter({' ': 6, 'e': 5, 's': 3, 'a': 2, 'i': 2, 'h': 2,
'l': 2, 't': 2, 'g': 1, 'f': 1, 'm': 1, 'o': 1, 'n': 1,
'p': 1, 'r': 1, 'x': 1})
>>> c['e']
5
>>> c['z']
0

Answer 4

回答by YardenR

Actually, the answer of Counter was already mentioned, but we can even do better (easier)!

其实Counter的回答已经提过了，但是我们还可以做得更好（更简单）！

from collections import Counter
my_list = ['a', 'b', 'b', 'a', 'b', 'c']
Counter(my_list)  # returns a Counter, dict-like object
>> Counter({'b': 3, 'a': 2, 'c': 1})

Answer 5

回答by Steven Huwig

This is an abomination, but:

这是可憎的，但是：

from itertools import groupby
dict((k, len(list(xs))) for k, xs in groupby(sorted(items)))

I can't think of a reason one would choose this method over S.Lott's, but if someone's going to point it out, it might as well be me. :)

我想不出有什么理由会选择这种方法而不是 S.Lott 的，但如果有人要指出它，那也可能是我。:)

Answer 6

回答by ???u

I have to share an interesting but kind of ridiculous way of doing it that I just came up with:

我必须分享一种我刚刚想出的有趣但有点荒谬的方法：

>>> class myfreq(dict):
...     def __init__(self, arr):
...         for k in arr:
...             self[k] = 1
...     def __setitem__(self, k, v):
...         dict.__setitem__(self, k, self.get(k, 0) + v)
... 
>>> myfreq(['a', 'b', 'b', 'a', 'b', 'c'])
{'a': 2, 'c': 1, 'b': 3}

Answer 7

回答by user8338

I decided to go ahead and test the versions suggested, I found the collections.Counteras suggested by Jacob Gabrielson to be the fastest, followed by the defaultdictversion by SLott.

我决定继续测试建议的版本，我发现collections.CounterJacob Gabrielson 建议的defaultdict版本是最快的，其次是 SLott的版本。

Here are my codes :

这是我的代码：

from collections import defaultdict
from collections import Counter

import random

# using default dict
def counter_default_dict(list):
    count=defaultdict(int)
    for i in list:
        count[i]+=1
    return count

# using normal dict
def counter_dict(list):
    count={}
    for i in list:
        count.update({i:count.get(i,0)+1})
    return count

# using count and dict
def counter_count(list):
    count={i:list.count(i) for i in set(list)}
    return count

# using count and dict
def counter_counter(list):
    count = Counter(list)
    return count

list=sorted([random.randint(0,250) for i in range(300)])


if __name__=='__main__':
    from timeit import timeit
    print("collections.Defaultdict ",timeit("counter_default_dict(list)", setup="from __main__ import counter_default_dict,list", number=1000))
    print("Dict",timeit("counter_dict(list)",setup="from __main__ import counter_dict,list",number=1000))
    print("list.count ",timeit("counter_count(list)", setup="from __main__ import counter_count,list", number=1000))
    print("collections.Counter.count ",timeit("counter_counter(list)", setup="from __main__ import counter_counter,list", number=1000))

And my results:

我的结果：

collections.Defaultdict 
0.06787874956330614
Dict
 0.15979115872995675
list.count 
 1.199258431219126
collections.Counter.count
 0.025896202538920665

Do let me know how I can improve the analysis.

请告诉我如何改进分析。

Answer 8

回答by imankalyan

I think using collection library is the easiest way to get it. But If you want to get the frequency dictionary without using it then it's another way,

我认为使用收藏库是获得它的最简单方法。但是如果你想在不使用它的情况下获得频率字典，那么它是另一种方式，

l = [1,4,2,1,2,6,8,2,2]
d ={}
for i in l:
    if i in d.keys():
        d[i] = 1 + d[i]
    else:
        d[i] = 1
print (d)

op:

操作：

{1: 2, 4: 1, 2: 4, 6: 1, 8: 1}

python 将单词列表转换为频率字典的最佳方法

提问by ???u

回答by SilentGhost

回答by S.Lott

回答by Jacob Gabrielson

回答by YardenR

回答by Steven Huwig

回答by ???u

回答by user8338

回答by imankalyan

相关推荐

最近更新

标签

python 将单词列表转换为频率字典的最佳方法

提问by ???u

回答by SilentGhost

回答by S.Lott

回答by Jacob Gabrielson

回答by YardenR

回答by Steven Huwig

回答by ???u

回答by user8338

回答by imankalyan

相关推荐

python 如何制作命令行文本编辑器？

python 在 Twitter OAuth POST 请求上获取 401

python 如何防止每个恶意文件上传到我的服务器上？（检查文件类型）？

python 将所有错误记录到 Django 站点上的控制台或文件

相关推荐

最近更新

标签