Python 如何找到列表中最常见的元素?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/3594514/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to find most common elements of a list?
提问by user434180
Given the following list
鉴于以下列表
['Jellicle', 'Cats', 'are', 'black', 'and', 'white,', 'Jellicle', 'Cats',
'are', 'rather', 'small;', 'Jellicle', 'Cats', 'are', 'merry', 'and',
'bright,', 'And', 'pleasant', 'to', 'hear', 'when', 'they', 'caterwaul.',
'Jellicle', 'Cats', 'have', 'cheerful', 'faces,', 'Jellicle', 'Cats',
'have', 'bright', 'black', 'eyes;', 'They', 'like', 'to', 'practise',
'their', 'airs', 'and', 'graces', 'And', 'wait', 'for', 'the', 'Jellicle',
'Moon', 'to', 'rise.', '']
I am trying to count how many times each word appears and display the top 3.
我正在尝试计算每个单词出现的次数并显示前 3 个。
However I am only looking to find the top three that have the first letter capitalized and ignore all words that do not have the first letter capitalized.
但是,我只想找到首字母大写的前三个单词,而忽略所有首字母不大写的单词。
I am sure there is a better way than this, but my idea was to do the following:
我确信有比这更好的方法,但我的想法是执行以下操作:
- put the first word in the list into another list called uniquewords
- delete the first word and all its duplicated from the original list
- add the new first word into unique words
- delete the first word and all its duplicated from original list.
- etc...
- until the original list is empty....
- count how many times each word in uniquewords appears in the original list
- find top 3 and print
- 将列表中的第一个单词放入另一个名为 uniquewords 的列表中
- 从原始列表中删除第一个单词及其所有重复项
- 将新的第一个单词添加到唯一单词中
- 从原始列表中删除第一个单词及其所有重复项。
- 等等...
- 直到原始列表为空....
- 计算 uniquewords 中的每个单词在原始列表中出现的次数
- 找到前 3 个并打印
采纳答案by Johnsyweb
If you are using an earlier version of Python or you have a very good reason to roll your own word counter (I'd like to hear it!), you could try the following approach using a dict.
如果您使用的是较早版本的 Python,或者您有充分的理由使用自己的字计数器(我很想听听!),您可以尝试以下方法,使用dict.
Python 2.6.1 (r261:67515, Feb 11 2010, 00:51:29)
[GCC 4.2.1 (Apple Inc. build 5646)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> word_list = ['Jellicle', 'Cats', 'are', 'black', 'and', 'white,', 'Jellicle', 'Cats', 'are', 'rather', 'small;', 'Jellicle', 'Cats', 'are', 'merry', 'and', 'bright,', 'And', 'pleasant', 'to', 'hear', 'when', 'they', 'caterwaul.', 'Jellicle', 'Cats', 'have', 'cheerful', 'faces,', 'Jellicle', 'Cats', 'have', 'bright', 'black', 'eyes;', 'They', 'like', 'to', 'practise', 'their', 'airs', 'and', 'graces', 'And', 'wait', 'for', 'the', 'Jellicle', 'Moon', 'to', 'rise.', '']
>>> word_counter = {}
>>> for word in word_list:
... if word in word_counter:
... word_counter[word] += 1
... else:
... word_counter[word] = 1
...
>>> popular_words = sorted(word_counter, key = word_counter.get, reverse = True)
>>>
>>> top_3 = popular_words[:3]
>>>
>>> top_3
['Jellicle', 'Cats', 'and']
Top Tip: The interactive Python interpretor is your friend whenever you want to play with an algorithm like this. Just type it in and watch it go, inspecting elements along the way.
重要提示:每当您想使用这样的算法时,交互式 Python 解释器就是您的朋友。只需输入并观看它,沿途检查元素。
回答by Mark Byers
In Python 2.7 and above there is a class called Counterwhich can help you:
在 Python 2.7 及更高版本中,有一个名为Counter的类可以帮助您:
from collections import Counter
words_to_count = (word for word in word_list if word[:1].isupper())
c = Counter(words_to_count)
print c.most_common(3)
Result:
结果:
[('Jellicle', 6), ('Cats', 5), ('And', 2)]
I am quite new to programming so please try and do it in the most barebones fashion.
我对编程很陌生,所以请尝试以最准系统的方式进行。
You could instead do this using a dictionary with the key being a word and the value being the count for that word. First iterate over the words adding them to the dictionary if they are not present, or else increasing the count for the word if it is present. Then to find the top three you can either use a simple O(n*log(n))sorting algorithm and take the first three elements from the result, or you can use a O(n)algorithm that scans the list once remembering only the top three elements.
您可以改为使用字典来执行此操作,键是单词,值是该单词的计数。首先遍历单词,如果它们不存在,则将它们添加到字典中,如果存在,则增加单词的计数。然后要找到前三个,您可以使用简单的O(n*log(n))排序算法并从结果中取出前三个元素,或者您可以使用O(n)扫描列表一次只记住前三个元素的算法。
An important observation for beginners is that by using builtin classes that are designed for the purpose you can save yourself a lot of work and/or get better performance. It is good to be familiar with the standard library and the features it offers.
对初学者的一个重要观察是,通过使用专为此目的而设计的内置类,您可以节省大量工作和/或获得更好的性能。熟悉标准库及其提供的功能是很好的。
回答by jvdneste
The simple wayof doing this would be (assuming your list is in 'l'):
这样做的简单方法是(假设您的列表在“l”中):
>>> counter = {}
>>> for i in l: counter[i] = counter.get(i, 0) + 1
>>> sorted([ (freq,word) for word, freq in counter.items() ], reverse=True)[:3]
[(6, 'Jellicle'), (5, 'Cats'), (3, 'to')]
Complete sample:
完整样本:
>>> l = ['Jellicle', 'Cats', 'are', 'black', 'and', 'white,', 'Jellicle', 'Cats', 'are', 'rather', 'small;', 'Jellicle', 'Cats', 'are', 'merry', 'and', 'bright,', 'And', 'pleasant', 'to', 'hear', 'when', 'they', 'caterwaul.', 'Jellicle', 'Cats', 'have', 'cheerful', 'faces,', 'Jellicle', 'Cats', 'have', 'bright', 'black', 'eyes;', 'They', 'like', 'to', 'practise', 'their', 'airs', 'and', 'graces', 'And', 'wait', 'for', 'the', 'Jellicle', 'Moon', 'to', 'rise.', '']
>>> counter = {}
>>> for i in l: counter[i] = counter.get(i, 0) + 1
...
>>> counter
{'and': 3, '': 1, 'merry': 1, 'rise.': 1, 'small;': 1, 'Moon': 1, 'cheerful': 1, 'bright': 1, 'Cats': 5, 'are': 3, 'have': 2, 'bright,': 1, 'for': 1, 'their': 1, 'rather': 1, 'when': 1, 'to': 3, 'airs': 1, 'black': 2, 'They': 1, 'practise': 1, 'caterwaul.': 1, 'pleasant': 1, 'hear': 1, 'they': 1, 'white,': 1, 'wait': 1, 'And': 2, 'like': 1, 'Jellicle': 6, 'eyes;': 1, 'the': 1, 'faces,': 1, 'graces': 1}
>>> sorted([ (freq,word) for word, freq in counter.items() ], reverse=True)[:3]
[(6, 'Jellicle'), (5, 'Cats'), (3, 'to')]
With simple I mean working in nearly every version of python.
简单我的意思是在几乎每个版本的 python 中工作。
if you don't understand some of the functions used in this sample, you can always do this in the interpreter (after pasting the code above):
如果您不了解本示例中使用的某些函数,您可以随时在解释器中执行此操作(粘贴上面的代码后):
>>> help(counter.get)
>>> help(sorted)
回答by mmmdreg
nltkis convenient for a lot of language processing stuff. It has methods for frequency distribution built in. Something like:
nltk对很多语言处理的东西都很方便。它具有内置的频率分布方法。例如:
import nltk
fdist = nltk.FreqDist(your_list) # creates a frequency distribution from a list
most_common = fdist.max() # returns a single element
top_three = fdist.keys()[:3] # returns a list
回答by unlockme
To just return a list containing the most common words:
只返回一个包含最常用单词的列表:
from collections import Counter
words=["i", "love", "you", "i", "you", "a", "are", "you", "you", "fine", "green"]
most_common_words= [word for word, word_count in Counter(words).most_common(3)]
print most_common_words
this prints:
这打印:
['you', 'i', 'a']
the 3 in "most_common(3)", specifies the number of items to print.
Counter(words).most_common()returns a a list of tuples with each tuple having the word as the first member and the frequency as the second member.The tuples are ordered by the frequency of the word.
“ most_common(3)”中的 3指定要打印的项目数。
Counter(words).most_common()返回一个元组列表,每个元组将单词作为第一个成员,频率作为第二个成员。元组按单词的频率排序。
`most_common = [item for item in Counter(words).most_common()]
print(str(most_common))
[('you', 4), ('i', 2), ('a', 1), ('are', 1), ('green', 1), ('love',1), ('fine', 1)]`
"the word for word, word_counter in", extracts only the first member of the tuple.
“the word for word, word_counter in”,只提取元组的第一个成员。
回答by JJC
The answer from @Mark Byers is best, but if you are on a version of Python < 2.7 (but at least 2.5, which is pretty old these days), you can replicate the Counter class functionality very simply via defaultdict (otherwise, for python < 2.5, three extra lines of code are needed before d[i] +=1, as in @Johnnysweb's answer).
@Mark Byers 的答案是最好的,但是如果您使用的 Python 版本小于 2.7(但至少 2.5,现在已经很旧了),您可以通过 defaultdict 非常简单地复制 Counter 类功能(否则,对于 python < 2.5,在 d[i] +=1 之前需要三行额外的代码,如@Johnnysweb 的回答)。
from collections import defaultdict
class Counter():
ITEMS = []
def __init__(self, items):
d = defaultdict(int)
for i in items:
d[i] += 1
self.ITEMS = sorted(d.iteritems(), reverse=True, key=lambda i: i[1])
def most_common(self, n):
return self.ITEMS[:n]
Then, you use the class exactly as in Mark Byers's answer, i.e.:
然后,您完全按照 Mark Byers 的回答使用该类,即:
words_to_count = (word for word in word_list if word[:1].isupper())
c = Counter(words_to_count)
print c.most_common(3)
回答by Chrigi
A simple, two-line solution to this, which does not require any extra modules is the following code:
一个简单的两行解决方案,不需要任何额外的模块,如下代码:
lst = ['Jellicle', 'Cats', 'are', 'black', 'and','white,',
'Jellicle', 'Cats','are', 'rather', 'small;', 'Jellicle',
'Cats', 'are', 'merry', 'and','bright,', 'And', 'pleasant',
'to','hear', 'when', 'they', 'caterwaul.','Jellicle',
'Cats', 'have','cheerful', 'faces,', 'Jellicle',
'Cats','have', 'bright', 'black','eyes;', 'They', 'like',
'to', 'practise','their', 'airs', 'and', 'graces', 'And',
'wait', 'for', 'the', 'Jellicle','Moon', 'to', 'rise.', '']
lst_sorted=sorted([ss for ss in set(lst) if len(ss)>0 and ss.istitle()],
key=lst.count,
reverse=True)
print lst_sorted[0:3]
Output:
输出:
['Jellicle', 'Cats', 'And']
The term in squared brackets returns all unique strings in the list, which are not empty and start with a capital letter. The sorted()function then sorts them by how often they appear in the list (by using the lst.countkey) in reverse order.
方括号中的术语返回列表中所有不为空且以大写字母开头的唯一字符串。sorted()然后,该函数按照它们在列表中出现的频率(通过使用lst.count键)以相反的顺序对它们进行排序。
回答by drew
If you are using Count, or have created your own Count-style dict and want to show the name of the item and the count of it, you can iterate around the dictionary like so:
如果您正在使用Count,或者已经创建了自己的Count风格的 dict 并且想要显示项目的名称和它的计数,您可以像这样遍历字典:
top_10_words = Counter(my_long_list_of_words)
# Iterate around the dictionary
for word in top_10_words:
# print the word
print word[0]
# print the count
print word[1]
or to iterate through this in a template:
或者在模板中迭代这个:
{% for word in top_10_words %}
<p>Word: {{ word.0 }}</p>
<p>Count: {{ word.1 }}</p>
{% endfor %}
Hope this helps someone
希望这有助于某人
回答by Tim Seed
Is't it just this ....
不就是这个吗....
word_list=['Jellicle', 'Cats', 'are', 'black', 'and', 'white,', 'Jellicle', 'Cats',
'are', 'rather', 'small;', 'Jellicle', 'Cats', 'are', 'merry', 'and',
'bright,', 'And', 'pleasant', 'to', 'hear', 'when', 'they', 'caterwaul.',
'Jellicle', 'Cats', 'have', 'cheerful', 'faces,', 'Jellicle', 'Cats',
'have', 'bright', 'black', 'eyes;', 'They', 'like', 'to', 'practise',
'their', 'airs', 'and', 'graces', 'And', 'wait', 'for', 'the', 'Jellicle',
'Moon', 'to', 'rise.', '']
from collections import Counter
c = Counter(word_list)
c.most_common(3)
Which should output
哪个应该输出
[('Jellicle', 6), ('Cats', 5), ('are', 3)]
[('Jellicle', 6), ('Cats', 5), ('are', 3)]
回答by Matthew D. Scholefield
There's two standard library ways to find the most frequent value in a list:
有两种标准库方法可以在列表中找到最频繁的值:
from statistics import mode
most_common = mode([3, 2, 2, 2, 1, 1]) # 2
most_common = mode([3, 2]) # StatisticsError: no unique mode
- Raises an exception if there's no unique most frequent value
- Only returns single most frequent value
- 如果没有唯一的最频繁值,则引发异常
- 只返回一个最频繁的值
collections.Counter.most_common:
collections.Counter.most_common:
from collections import Counter
most_common, count = Counter([3, 2, 2, 2, 1, 1]).most_common(2) # 2, 3
(most_common_1, count_1), (most_common_2, count_2) = Counter([3, 2, 2]).most_common(2) # (2, 2), (3, 1)
- Can return multiple most frequent values
- Returns element count as well
- 可以返回多个最频繁的值
- 也返回元素计数
So in the case of the question, the second one would be the right choice. As a side note, both are identical in terms of performance.
所以在这个问题的情况下,第二个将是正确的选择。作为旁注,两者在性能方面是相同的。

