Python 查找字符串中出现频率最高的字符
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/4131123/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Finding the most frequent character in a string
提问by Sunandmoon
I found this programming problem while looking at a job posting on SO. I thought it was pretty interesting and as a beginner Python programmer I attempted to tackle it. However I feel my solution is quite...messy...can anyone make any suggestions to optimize it or make it cleaner? I know it's pretty trivial, but I had fun writing it. Note: Python 2.6
我在查看 SO 上的职位发布时发现了这个编程问题。我认为这很有趣,作为初学者的 Python 程序员,我试图解决它。但是我觉得我的解决方案非常......凌乱......任何人都可以提出任何建议来优化它或使其更干净吗?我知道这很琐碎,但我写得很开心。注意:Python 2.6
The problem:
问题:
Write pseudo-code (or actual code) for a function that takes in a string and returns the letter that appears the most in that string.
为接受字符串并返回该字符串中出现次数最多的字母的函数编写伪代码(或实际代码)。
My attempt:
我的尝试:
import string
def find_max_letter_count(word):
alphabet = string.ascii_lowercase
dictionary = {}
for letters in alphabet:
dictionary[letters] = 0
for letters in word:
dictionary[letters] += 1
dictionary = sorted(dictionary.items(),
reverse=True,
key=lambda x: x[1])
for position in range(0, 26):
print dictionary[position]
if position != len(dictionary) - 1:
if dictionary[position + 1][1] < dictionary[position][1]:
break
find_max_letter_count("helloworld")
Output:
输出:
>>>
('l', 3)
Updated example:
更新示例:
find_max_letter_count("balloon")
>>>
('l', 2)
('o', 2)
采纳答案by Greg Hewgill
There are many ways to do this shorter. For example, you can use the Counterclass (in Python 2.7 or later):
有很多方法可以更短地做到这一点。例如,您可以使用Counter该类(在 Python 2.7 或更高版本中):
import collections
s = "helloworld"
print(collections.Counter(s).most_common(1)[0])
If you don't have that, you can do the tally manually (2.5 or later has defaultdict):
如果没有,您可以手动进行计数(2.5 或更高版本有defaultdict):
d = collections.defaultdict(int)
for c in s:
d[c] += 1
print(sorted(d.items(), key=lambda x: x[1], reverse=True)[0])
Having said that, there's nothing too terribly wrong with your implementation.
话虽如此,您的实施并没有什么大错。
回答by Chris Morgan
Here are a few things I'd do:
以下是我要做的几件事:
- Use
collections.defaultdictinstead of thedictyou initialise manually. - Use inbuilt sorting and max functions like
maxinstead of working it out yourself - it's easier.
- 使用
collections.defaultdict而不是dict您手动初始化。 - 使用内置的排序和 max 函数,
max而不是自己解决 - 这更容易。
Here's my final result:
这是我的最终结果:
from collections import defaultdict
def find_max_letter_count(word):
matches = defaultdict(int) # makes the default value 0
for char in word:
matches[char] += 1
return max(matches.iteritems(), key=lambda x: x[1])
find_max_letter_count('helloworld') == ('l', 3)
回答by meson10
If you are using Python 2.7, you can quickly do this by using collections module. collections is a hight performance data structures module. Read more at http://docs.python.org/library/collections.html#counter-objects
如果您使用的是 Python 2.7,您可以使用 collections 模块快速完成此操作。collections 是一个高性能的数据结构模块。在http://docs.python.org/library/collections.html#counter-objects阅读更多
>>> from collections import Counter
>>> x = Counter("balloon")
>>> x
Counter({'o': 2, 'a': 1, 'b': 1, 'l': 2, 'n': 1})
>>> x['o']
2
回答by Eric O Lebigot
If you want to have allthe characters with the maximum number of counts, then you can do a variation on one of the two ideas proposed so far:
如果您想让所有字符都具有最大计数,那么您可以对目前提出的两个想法之一进行变体:
import heapq # Helps finding the n largest counts
import collections
def find_max_counts(sequence):
"""
Returns an iterator that produces the (element, count)s with the
highest number of occurrences in the given sequence.
In addition, the elements are sorted.
"""
if len(sequence) == 0:
raise StopIteration
counter = collections.defaultdict(int)
for elmt in sequence:
counter[elmt] += 1
counts_heap = [
(-count, elmt) # The largest elmt counts are the smallest elmts
for (elmt, count) in counter.iteritems()]
heapq.heapify(counts_heap)
highest_count = counts_heap[0][0]
while True:
try:
(opp_count, elmt) = heapq.heappop(counts_heap)
except IndexError:
raise StopIteration
if opp_count != highest_count:
raise StopIteration
yield (elmt, -opp_count)
for (letter, count) in find_max_counts('balloon'):
print (letter, count)
for (word, count) in find_max_counts(['he', 'lkj', 'he', 'll', 'll']):
print (word, count)
This yields, for instance:
这会产生,例如:
lebigot@weinberg /tmp % python count.py
('l', 2)
('o', 2)
('he', 2)
('ll', 2)
This works with any sequence: words, but also ['hello', 'hello', 'bonjour'], for instance.
这适用于任何序列:单词,但也适用于 ['hello', 'hello', 'bonjour'],例如。
The heapqstructure is very efficient at finding the smallest elements of a sequence without sorting it completely. On the other hand, since there are not so many letter in the alphabet, you can probably also run through the sorted list of counts until the maximum count is not found anymore, without this incurring any serious speed loss.
该heapq结构在查找序列的最小元素时非常有效,而无需对其进行完全排序。另一方面,由于字母表中没有那么多字母,您可能还可以遍历排序的计数列表,直到不再找到最大计数,而不会导致任何严重的速度损失。
回答by kyle k
Here is way to find the most common character using a dictionary
这是使用字典查找最常见字符的方法
message = "hello world"
d = {}
letters = set(message)
for l in letters:
d[message.count(l)] = l
print d[d.keys()[-1]], d.keys()[-1]
回答by eerock
def most_frequent(text):
frequencies = [(c, text.count(c)) for c in set(text)]
return max(frequencies, key=lambda x: x[1])[0]
s = 'ABBCCCDDDD'
print(most_frequent(s))
frequenciesis a list of tuples that count the characters as (character, count). We apply max to the tuples using count's and return that tuple's character. In the event of a tie, this solution will pick only one.
frequencies是将字符计为 的元组列表(character, count)。我们使用count's将 max 应用于元组并返回该元组character。如果出现平局,此解决方案将只选择一个。
回答by Josh Anish
#file:filename
#quant:no of frequent words you want
def frequent_letters(file,quant):
file = open(file)
file = file.read()
cnt = Counter
op = cnt(file).most_common(quant)
return op
回答by Soudipta Dutta
Question : Most frequent character in a string The maximum occurring character in an input string
问题:字符串中出现频率最高的字符输入字符串中出现次数最多的字符
Method 1 :
方法一:
a = "GiniGinaProtijayi"
d ={}
chh = ''
max = 0
for ch in a : d[ch] = d.get(ch,0) +1
for val in sorted(d.items(),reverse=True , key = lambda ch : ch[1]):
chh = ch
max = d.get(ch)
print(chh)
print(max)
Method 2 :
方法二:
a = "GiniGinaProtijayi"
max = 0
chh = ''
count = [0] * 256
for ch in a : count[ord(ch)] += 1
for ch in a :
if(count[ord(ch)] > max):
max = count[ord(ch)]
chh = ch
print(chh)
Method 3 :
方法三:
import collections
a = "GiniGinaProtijayi"
aa = collections.Counter(a).most_common(1)[0]
print(aa)
回答by Chris Alderson
I noticed that most of the answers only come back with one item even if there is an equal amount of characters most commonly used. For example "iii 444 yyy 999". There are an equal amount of spaces, i's, 4's, y's, and 9's. The solution should come back with everything, not just the letter i:
我注意到,即使最常用的字符数量相同,大多数答案也只返回一个项目。例如“iii 444 yyy 999”。有相同数量的空格,i、4、y 和 9。解决方案应该包含所有内容,而不仅仅是字母 i:
sentence = "iii 444 yyy 999"
# Returns the first items value in the list of tuples (i.e) the largest number
# from Counter().most_common()
largest_count: int = Counter(sentence).most_common()[0][1]
# If the tuples value is equal to the largest value, append it to the list
most_common_list: list = [(x, y)
for x, y in Counter(sentence).items() if y == largest_count]
print(most_common_count)
# RETURNS
[('i', 3), (' ', 3), ('4', 3), ('y', 3), ('9', 3)]

