string 查找字符串中第一个不重复的字符

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2285533/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-09 00:40:17  来源:igfitidea点击:

Find the first un-repeated character in a string

algorithmlanguage-agnosticstring

提问by Thunderhashy

What is the quickest way to find the first character which only appears once in a string?

查找在字符串中只出现一次的第一个字符的最快方法是什么?

采纳答案by Ryan Prior

You can't know that the character is un-repeated until you've processed the whole string, so my suggestion would be this:

在处理完整个字符串之前,您无法知道该字符是否重复,所以我的建议是:

def first_non_repeated_character(string):
  chars = []
  repeated = []
  for character in string:
    if character in chars:
      chars.remove(character)
      repeated.append(character)
    else:
      if not character in repeated:
        chars.append(character)
  if len(chars):
    return chars[0]
  else:
    return False

Edit: originally posted code was bad, but this latest snippet is Certified To Work On Ryan's Computer?.

编辑:最初发布的代码很糟糕,但这个最新的代码片段是经过认证可以在 Ryan 的计算机上工作的?。

回答by Mark Byers

It has to be at least O(n) because you don't know if a character will be repeated until you've read all characters.

它必须至少为 O(n),因为在您阅读所有字符之前,您不知道某个字符是否会重复。

So you can iterate over the characters and append each character to a list the first time you see it, and separately keep a count of how many times you've seen it (in fact the only values that matter for the count is "0", "1" or "more than 1").

因此,您可以遍历字符并将每个字符添加到您第一次看到它的列表中,并分别记录您看到它的次数(实际上,对计数重要的唯一值是“0” 、“1”或“多于 1”)。

When you reach the end of the string you just have to find the first character in the list that has a count of exactly one.

当您到达字符串的末尾时,您只需找到列表中第一个计数正好为 1 的字符。



Example code in Python:

Python中的示例代码:

def first_non_repeated_character(s):
    counts = defaultdict(int)
    l = []
    for c in s:
        counts[c] += 1
        if counts[c] == 1:
            l.append(c)

    for c in l:
        if counts[c] == 1:
            return c

    return None

This runs in O(n).

这在 O(n) 中运行。

回答by just_wes

Why not use a heap based data structure such as a minimum priority queue. As you read each character from the string, add it to the queue with a priority based on the location in the string and the number of occurrences so far. You could modify the queue to add priorities on collision so that the priority of a character is the sum of the number appearances of that character. At the end of the loop, the first element in the queue will be the least frequent character in the string and if there are multiple characters with a count == 1, the first element was the first unique character added to the queue.

为什么不使用基于堆的数据结构,例如最小优先级队列。当您从字符串中读取每个字符时,根据字符串中的位置和到目前为止出现的次数将其添加到队列中。您可以修改队列以添加碰撞优先级,以便角色的优先级是该角色出现次数的总和。在循环结束时,队列中的第一个元素将是字符串中出现频率最低的字符,如果存在多个 count == 1 的字符,则第一个元素是添加到队列中的第一个唯一字符。

回答by John La Rooy

Here is another fun way to do it. Counter requires Python2.7or Python3.1

这是另一种有趣的方法。计数器需要Python2.7Python3.1

>>> from collections import Counter
>>> def first_non_repeated_character(s):
...     return min((k for k,v in Counter(s).items() if v<2), key=s.index)
...
>>> first_non_repeated_character("aaabbbcddd")
'c'
>>> first_non_repeated_character("aaaebbbcddd")
'e'

回答by Adrian McCarthy

Lots of answers are attempting O(n) but are forgetting the actual costs of inserting and removing from the lists/associative arrays/sets they're using to track.

许多答案都在尝试 O(n),但忘记了从它们用来跟踪的列表/关联数组/集合中插入和删除的实际成本。

If you can assume that a char is a single byte, then you use a simple array indexed by the char and keep a count in it. This is truly O(n) because the array accesses are guaranteed O(1), and the final pass over the array to find the first element with 1 is constant time (because the array has a small, fixed size).

如果您可以假设 char 是单个字节,那么您可以使用一个由 char 索引的简单数组并在其中保留一个计数。这确实是 O(n),因为数组访问保证为 O(1),并且最终遍历数组以找到具有 1 的第一个元素是常数时间(因为数组具有较小的固定大小)。

If you can't assume that a char is a single byte, then I would propose sorting the string and then doing a single pass checking adjacent values. This would be O(n log n) for the sort plus O(n) for the final pass. So it's effectively O(n log n), which is better than O(n^2). Also, it has virtually no space overhead, which is another problem with many of the answers that are attempting O(n).

如果您不能假设 char 是单个字节,那么我建议对字符串进行排序,然后单次检查相邻值。这将是排序的 O(n log n) 加上最后一次传递的 O(n)。所以它实际上是 O(n log n),比 O(n^2) 好。此外,它几乎没有空间开销,这是许多尝试 O(n) 的答案的另一个问题。

回答by user1798670

The following is a Ruby implementation of finding the first nonrepeated character of a string:

以下是查找字符串的第一个非重复字符的 Ruby 实现:

def first_non_repeated_character(string)
  string1 = string.split('')
  string2 = string.split('')

  string1.each do |let1|
    counter = 0
    string2.each do |let2|
      if let1 == let2
        counter+=1
      end
    end
  if counter == 1 
    return let1
    break
  end
end
end

p first_non_repeated_character('dont doddle in the forest')

And here is a JavaScript implementation of the same style function:

这是相同样式函数的 JavaScript 实现:

var first_non_repeated_character = function (string) {
  var string1 = string.split('');
  var string2 = string.split('');

  var single_letters = [];

  for (var i = 0; i < string1.length; i++) {
    var count = 0;
    for (var x = 0; x < string2.length; x++) {
      if (string1[i] == string2[x]) {
        count++
      }
    }
    if (count == 1) {
      return string1[i];
    }
  }
}

console.log(first_non_repeated_character('dont doddle in the forest'));
console.log(first_non_repeated_character('how are you today really?'));

In both cases I used a counter knowing that if the letter is not matched anywhere in the string, it will only occur in the string once so I just count it's occurrence.

在这两种情况下,我都使用了一个计数器,知道如果该字母在字符串中的任何地方都不匹配,它只会在字符串中出现一次,所以我只计算它的出现次数。

回答by John La Rooy

Counter requires Python2.7or Python3.1

计数器需要Python2.7Python3.1

>>> from collections import Counter
>>> def first_non_repeated_character(s):
...     counts = Counter(s)
...     for c in s:
...         if counts[c]==1:
...             return c
...     return None
... 
>>> first_non_repeated_character("aaabbbcddd")
'c'
>>> first_non_repeated_character("aaaebbbcddd")
'e'

回答by Pete Fordham

I think this should do it in C. This operates in O(n) time with no ambiguity about order of insertion and deletion operators. This is a counting sort (simplest form of a bucket sort, which itself is the simple form of a radix sort).

我认为这应该在 C 中完成。这在 O(n) 时间内运行,插入和删除运算符的顺序没有歧义。这是一种计数排序(桶排序的最简单形式,它本身就是基数排序的简单形式)。

unsigned char find_first_unique(unsigned char *string)
{
    int chars[256];
    int i=0;
    memset(chars, 0, sizeof(chars));

    while (string[i++])
    {
        chars[string[i]]++;
    }

    i = 0;
    while (string[i++])
    {
        if (chars[string[i]] == 1) return string[i];
    }
    return 0;
}

回答by Magge

Refactoring a solution proposed earlier (not having to use extra list/memory). This goes over the string twice. So this takes O(n) too like the original solution.

重构之前提出的解决方案(不必使用额外的列表/内存)。这将遍历字符串两次。所以这也像原始解决方案一样需要 O(n)。

def first_non_repeated_character(s):
    counts = defaultdict(int)
    for c in s:
        counts[c] += 1
    for c in s:
        if counts[c] == 1:
            return c
    return None

回答by rantler

In Ruby:

在红宝石中:

(Original Credit: Andrew A. Smith)

(原始信用:安德鲁·A·史密斯)

x = "a huge string in which some characters repeat"

def first_unique_character(s)
 s.each_char.detect { |c| s.count(c) == 1 }
end

first_unique_character(x)
=> "u"