Python 计算列表中单词的频率并按频率排序

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/20510768/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 20:38:27  来源:igfitidea点击:

Count frequency of words in a list and sort by frequency

pythonpython-3.xlistfrequencyword

提问by user3088605

I am using Python 3.3

我正在使用 Python 3.3

I need to create two lists, one for the unique words and the other for the frequencies of the word.

我需要创建两个列表,一个用于唯一单词,另一个用于单词出现的频率。

I have to sort the unique word list based on the frequencies list so that the word with the highest frequency is first in the list.

我必须根据频率列表对唯一的单词列表进行排序,以便频率最高的单词在列表中排在第一位。

I have the design in text but am uncertain how to implement it in Python.

我有文本设计,但不确定如何在 Python 中实现它。

The methods I have found so far use either Counteror dictionaries which we have not learned. I have already created the list from the file containing all the words but do not know how to find the frequency of each word in the list. I know I will need a loop to do this but cannot figure it out.

到目前为止,我发现的方法使用了Counter我们还没有学习过的字典或字典。我已经从包含所有单词的文件中创建了列表,但不知道如何找到列表中每个单词的频率。我知道我需要一个循环来做到这一点,但无法弄清楚。

Here's the basic design:

这是基本设计:

 original list = ["the", "car",....]
 newlst = []
 frequency = []
 for word in the original list
       if word not in newlst:
           newlst.append(word)
           set frequency = 1
       else
           increase the frequency
 sort newlst based on frequency list 

回答by KGo

The ideal way is to use a dictionary that maps a word to it's count. But if you can't use that, you might want to use 2 lists - 1 storing the words, and the other one storing counts of words. Note that order of words and counts matters here. Implementing this would be hard and not very efficient.

理想的方法是使用将单词映射到其计数的字典。但是如果你不能使用它,你可能想要使用 2 个列表 - 一个存储单词,另一个存储单词计数。请注意,单词和计数的顺序在这里很重要。实现这一点会很困难,而且效率不高。

回答by johannestaas

Using Counter would be the best way, but if you don't want to do that, you can implement it yourself this way.

使用 Counter 将是最好的方法,但如果您不想这样做,您可以通过这种方式自己实现。

# The list you already have
word_list = ['words', ..., 'other', 'words']
# Get a set of unique words from the list
word_set = set(word_list)
# create your frequency dictionary
freq = {}
# iterate through them, once per unique word.
for word in word_set:
    freq[word] = word_list.count(word) / float(len(word_list))

freq will end up with the frequency of each word in the list you already have.

freq 将以您已有的列表中每个单词的频率结束。

You need floatin there to convert one of the integers to a float, so the resulting value will be a float.

您需要float在那里将其中一个整数转换为浮点数,因此结果值将是一个浮点数。

Edit:

编辑:

If you can't use a dict or set, here is another less efficient way:

如果您不能使用 dict 或 set,这是另一种效率较低的方法:

# The list you already have
word_list = ['words', ..., 'other', 'words']
unique_words = []
for word in word_list:
    if word not in unique_words:
        unique_words += [word]
word_frequencies = []
for word in unique_words:
    word_frequencies += [float(word_list.count(word)) / len(word_list)]
for i in range(len(unique_words)):
    print(unique_words[i] + ": " + word_frequencies[i])

The indicies of unique_wordsand word_frequencieswill match.

的indiciesunique_wordsword_frequencies匹配。

回答by kyle k

words = file("test.txt", "r").read().split() #read the words into a list.
uniqWords = sorted(set(words)) #remove duplicate words and sort
for word in uniqWords:
    print words.count(word), word

回答by Milo P

One way would be to make a list of lists, with each sub-list in the new list containing a word and a count:

一种方法是制作一个列表列表,新列表中的每个子列表都包含一个单词和一个计数:

list1 = []    #this is your original list of words
list2 = []    #this is a new list

for word in list1:
    if word in list2:
        list2.index(word)[1] += 1
    else:
        list2.append([word,0])

Or, more efficiently:

或者,更有效地:

for word in list1:
    try:
        list2.index(word)[1] += 1
    except:
        list2.append([word,0])

This would be less efficient than using a dictionary, but it uses more basic concepts.

这比使用字典效率低,但它使用了更基本的概念。

回答by tdolydong

You can use

您可以使用

from collections import Counter

It supports Python 2.7,read more information here

它支持 Python 2.7,在这里阅读更多信息

1.

1.

>>>c = Counter('abracadabra')
>>>c.most_common(3)
[('a', 5), ('r', 2), ('b', 2)]

use dict

使用字典

>>>d={1:'one', 2:'one', 3:'two'}
>>>c = Counter(d.values())
[('one', 2), ('two', 1)]

But, You have to read the file first, and converted to dict.

但是,您必须先读取文件,然后转换为 dict。

2. it's the python docs example,use re and Counter

2.这是python文档示例,使用re和Counter

# Find the ten most common words in Hamlet
>>> import re
>>> words = re.findall(r'\w+', open('hamlet.txt').read().lower())
>>> Counter(words).most_common(10)
[('the', 1143), ('and', 966), ('to', 762), ('of', 669), ('i', 631),
 ('you', 554),  ('a', 546), ('my', 514), ('hamlet', 471), ('in', 451)]

回答by Ashif Abdulrahman

use this

用这个

from collections import Counter
list1=['apple','egg','apple','banana','egg','apple']
counts = Counter(list1)
print(counts)
# Counter({'apple': 3, 'egg': 2, 'banana': 1})

回答by Reza Abtin

Yet another solution with another algorithm without using collections:

不使用集合的另一种算法的另一种解决方案:

def countWords(A):
   dic={}
   for x in A:
       if not x in  dic:        #Python 2.7: if not dic.has_key(x):
          dic[x] = A.count(x)
   return dic

dic = countWords(['apple','egg','apple','banana','egg','apple'])
sorted_items=sorted(dic.items())   # if you want it sorted

回答by Gadi

You can use reduce() - A functional way.

您可以使用 reduce() - 一种功能方式。

words = "apple banana apple strawberry banana lemon"
reduce( lambda d, c: d.update([(c, d.get(c,0)+1)]) or d, words.split(), {})

returns:

返回:

{'strawberry': 1, 'lemon': 1, 'apple': 2, 'banana': 2}

回答by Paige Goulding

Try this:

尝试这个:

words = []
freqs = []

for line in sorted(original list): #takes all the lines in a text and sorts them
    line = line.rstrip() #strips them of their spaces
    if line not in words: #checks to see if line is in words
        words.append(line) #if not it adds it to the end words
        freqs.append(1) #and adds 1 to the end of freqs
    else:
        index = words.index(line) #if it is it will find where in words
        freqs[index] += 1 #and use the to change add 1 to the matching index in freqs

回答by M7hegazy

the best thing to do is :

最好的办法是:

def wordListToFreqDict(wordlist):
    wordfreq = [wordlist.count(p) for p in wordlist]
    return dict(zip(wordlist, wordfreq))

then try to : wordListToFreqDict(originallist)

然后尝试: wordListToFreqDict(originallist)