Python 查找字符串中出现频率最高的字符

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/4131123/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 14:23:37  来源:igfitidea点击:

Finding the most frequent character in a string

pythonalgorithmoptimizationtime-complexity

提问by Sunandmoon

I found this programming problem while looking at a job posting on SO. I thought it was pretty interesting and as a beginner Python programmer I attempted to tackle it. However I feel my solution is quite...messy...can anyone make any suggestions to optimize it or make it cleaner? I know it's pretty trivial, but I had fun writing it. Note: Python 2.6

我在查看 SO 上的职位发布时发现了这个编程问题。我认为这很有趣,作为初学者的 Python 程序员,我试图解决它。但是我觉得我的解决方案非常......凌乱......任何人都可以提出任何建议来优化它或使其更干净吗?我知道这很琐碎,但我写得很开心。注意:Python 2.6

The problem:

问题:

Write pseudo-code (or actual code) for a function that takes in a string and returns the letter that appears the most in that string.

为接受字符串并返回该字符串中出现次数最多的字母的函数编写伪代码(或实际代码)。

My attempt:

我的尝试:

import string

def find_max_letter_count(word):

    alphabet = string.ascii_lowercase
    dictionary = {}

    for letters in alphabet:
        dictionary[letters] = 0

    for letters in word:
        dictionary[letters] += 1

    dictionary = sorted(dictionary.items(), 
                        reverse=True, 
                        key=lambda x: x[1])

    for position in range(0, 26):
        print dictionary[position]
        if position != len(dictionary) - 1:
            if dictionary[position + 1][1] < dictionary[position][1]:
                break

find_max_letter_count("helloworld")

Output:

输出:

>>> 
('l', 3)

Updated example:

更新示例:

find_max_letter_count("balloon") 
>>>
('l', 2)
('o', 2)

采纳答案by Greg Hewgill

There are many ways to do this shorter. For example, you can use the Counterclass (in Python 2.7 or later):

有很多方法可以更短地做到这一点。例如,您可以使用Counter该类(在 Python 2.7 或更高版本中):

import collections
s = "helloworld"
print(collections.Counter(s).most_common(1)[0])

If you don't have that, you can do the tally manually (2.5 or later has defaultdict):

如果没有,您可以手动进行计数(2.5 或更高版本有defaultdict):

d = collections.defaultdict(int)
for c in s:
    d[c] += 1
print(sorted(d.items(), key=lambda x: x[1], reverse=True)[0])

Having said that, there's nothing too terribly wrong with your implementation.

话虽如此,您的实施并没有什么大错。

回答by Chris Morgan

Here are a few things I'd do:

以下是我要做的几件事:

  • Use collections.defaultdictinstead of the dictyou initialise manually.
  • Use inbuilt sorting and max functions like maxinstead of working it out yourself - it's easier.
  • 使用collections.defaultdict而不是dict您手动初始化。
  • 使用内置的排序和 max 函数,max而不是自己解决 - 这更容易。

Here's my final result:

这是我的最终结果:

from collections import defaultdict

def find_max_letter_count(word):
    matches = defaultdict(int)  # makes the default value 0

    for char in word:
        matches[char] += 1

    return max(matches.iteritems(), key=lambda x: x[1])

find_max_letter_count('helloworld') == ('l', 3)

回答by meson10

If you are using Python 2.7, you can quickly do this by using collections module. collections is a hight performance data structures module. Read more at http://docs.python.org/library/collections.html#counter-objects

如果您使用的是 Python 2.7,您可以使用 collections 模块快速完成此操作。collections 是一个高性能的数据结构模块。在http://docs.python.org/library/collections.html#counter-objects阅读更多

>>> from collections import Counter
>>> x = Counter("balloon")
>>> x
Counter({'o': 2, 'a': 1, 'b': 1, 'l': 2, 'n': 1})
>>> x['o']
2

回答by Eric O Lebigot

If you want to have allthe characters with the maximum number of counts, then you can do a variation on one of the two ideas proposed so far:

如果您想让所有字符都具有最大计数,那么您可以对目前提出的两个想法之一进行变体:

import heapq  # Helps finding the n largest counts
import collections

def find_max_counts(sequence):
    """
    Returns an iterator that produces the (element, count)s with the
    highest number of occurrences in the given sequence.

    In addition, the elements are sorted.
    """

    if len(sequence) == 0:
        raise StopIteration

    counter = collections.defaultdict(int)
    for elmt in sequence:
        counter[elmt] += 1

    counts_heap = [
        (-count, elmt)  # The largest elmt counts are the smallest elmts
        for (elmt, count) in counter.iteritems()]

    heapq.heapify(counts_heap)

    highest_count = counts_heap[0][0]

    while True:

        try:
            (opp_count, elmt) = heapq.heappop(counts_heap)
        except IndexError:
            raise StopIteration

        if opp_count != highest_count:
            raise StopIteration

        yield (elmt, -opp_count)

for (letter, count) in find_max_counts('balloon'):
    print (letter, count)

for (word, count) in find_max_counts(['he', 'lkj', 'he', 'll', 'll']):
    print (word, count)

This yields, for instance:

这会产生,例如:

lebigot@weinberg /tmp % python count.py
('l', 2)
('o', 2)
('he', 2)
('ll', 2)

This works with any sequence: words, but also ['hello', 'hello', 'bonjour'], for instance.

这适用于任何序列:单词,但也适用于 ['hello', 'hello', 'bonjour'],例如。

The heapqstructure is very efficient at finding the smallest elements of a sequence without sorting it completely. On the other hand, since there are not so many letter in the alphabet, you can probably also run through the sorted list of counts until the maximum count is not found anymore, without this incurring any serious speed loss.

heapq结构在查找序列的最小元素时非常有效,而无需对其进行完全排序。另一方面,由于字母表中没有那么多字母,您可能还可以遍历排序的计数列表,直到不再找到最大计数,而不会导致任何严重的速度损失。

回答by kyle k

Here is way to find the most common character using a dictionary

这是使用字典查找最常见字符的方法

message = "hello world"
d = {}
letters = set(message)
for l in letters:
    d[message.count(l)] = l

print d[d.keys()[-1]], d.keys()[-1]

回答by eerock

def most_frequent(text):
    frequencies = [(c, text.count(c)) for c in set(text)]
    return max(frequencies, key=lambda x: x[1])[0]

s = 'ABBCCCDDDD'
print(most_frequent(s))

frequenciesis a list of tuples that count the characters as (character, count). We apply max to the tuples using count's and return that tuple's character. In the event of a tie, this solution will pick only one.

frequencies是将字符计为 的元组列表(character, count)。我们使用count's将 max 应用于元组并返回该元组character。如果出现平局,此解决方案将只选择一个。

回答by Josh Anish

#file:filename
#quant:no of frequent words you want

def frequent_letters(file,quant):
    file = open(file)
    file = file.read()
    cnt = Counter
    op = cnt(file).most_common(quant)
    return op   

回答by Soudipta Dutta

Question : Most frequent character in a string The maximum occurring character in an input string

问题:字符串中出现频率最高的字符输入字符串中出现次数最多的字符

Method 1 :

方法一:

a = "GiniGinaProtijayi"

d ={}
chh = ''
max = 0 
for ch in a : d[ch] = d.get(ch,0) +1 
for val in sorted(d.items(),reverse=True , key = lambda ch : ch[1]):
    chh = ch
    max  = d.get(ch)


print(chh)  
print(max)  

Method 2 :

方法二:

a = "GiniGinaProtijayi"

max = 0 
chh = ''
count = [0] * 256 
for ch in a : count[ord(ch)] += 1
for ch in a :
    if(count[ord(ch)] > max):
        max = count[ord(ch)] 
        chh = ch

print(chh)        

Method 3 :

方法三:

import collections

a = "GiniGinaProtijayi"

aa = collections.Counter(a).most_common(1)[0]
print(aa)

回答by Chris Alderson

I noticed that most of the answers only come back with one item even if there is an equal amount of characters most commonly used. For example "iii 444 yyy 999". There are an equal amount of spaces, i's, 4's, y's, and 9's. The solution should come back with everything, not just the letter i:

我注意到,即使最常用的字符数量相同,大多数答案也只返回一个项目。例如“iii 444 yyy 999”。有相同数量的空格,i、4、y 和 9。解决方案应该包含所有内容,而不仅仅是字母 i:

sentence = "iii 444 yyy 999"

# Returns the first items value in the list of tuples (i.e) the largest number
# from Counter().most_common()
largest_count: int = Counter(sentence).most_common()[0][1]

# If the tuples value is equal to the largest value, append it to the list
most_common_list: list = [(x, y)
                         for x, y in Counter(sentence).items() if y == largest_count]

print(most_common_count)

# RETURNS
[('i', 3), (' ', 3), ('4', 3), ('y', 3), ('9', 3)]