C# 测试字符串中的重复字符

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/829537/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-05 03:19:42  来源:igfitidea点击:

Testing for repeated characters in a string

c#algorithmstring

提问by inspite

I'm doing some work with strings, and I have a scenario where I need to determine if a string (usually a small one < 10 characters) contains repeated characters.

我正在做一些字符串的工作,我有一个场景,我需要确定一个字符串(通常是一个 < 10 个字符的小字符串)是否包含重复的字符。

`ABCDE`  // does not contain repeats 
`AABCD`  // does contain repeats, ie A is repeated

I can loop through the string.ToCharArray() and test each character against every other character in the char[], but I feel like I am missing something obvious.... maybe I just need coffee. Can anyone help?

我可以遍历 string.ToCharArray() 并针对 char[] 中的每个其他字符测试每个字符,但我觉得我遗漏了一些明显的东西......也许我只需要咖啡。任何人都可以帮忙吗?

EDIT:

编辑:

The string will be sorted, so order is not important so ABCDA => AABCD

字符串将被排序,所以顺序并不重要,所以 ABCDA => AABCD

The frequency of repeats is also important, so I need to know if the repeat is pair or triplet etc.

重复的频率也很重要,所以我需要知道重复是成对还是三联等。

采纳答案by Jon Skeet

If the string is short, then just looping and testing may well be the simplest and most efficient way. I mean you couldcreate a hash set (in whatever platform you're using) and iterate through the characters, failing if the character is already in the set and adding it to the set otherwise - but that's only likely to provide any benefit when the strings are longer.

如果字符串很短,那么循环和测试可能是最简单和最有效的方法。我的意思是你可以创建一个散列集(在你使用的任何平台上)并遍历字符,如果字符已经在集合中则失败,否则将其添加到集合中 - 但这只会在以下情况下提供任何好处字符串更长。

EDIT: Now that we know it's sorted, mquander's answeris the best one IMO. Here's an implementation:

编辑:既然我们知道它已排序,mquander 的答案是 IMO 中最好的答案。这是一个实现:

public static bool IsSortedNoRepeats(string text)
{
    if (text.Length == 0)
    {
        return true;
    }
    char current = text[0];
    for (int i=1; i < text.Length; i++)
    {
        char next = text[i];
        if (next <= current)
        {
            return false;
        }
        current = next;
    }
    return true;
}

A shorter alternative if you don't mind repeating the indexer use:

如果您不介意重复使用索引器,则可以使用更短的替代方法:

public static bool IsSortedNoRepeats(string text)
{
    for (int i=1; i < text.Length; i++)
    {
        if (text[i] <= text[i-1])
        {
            return false;
        }
    }
    return true;
}

EDIT: Okay, with the "frequency" side, I'll turn the problem round a bit. I'm still going to assume that the string is sorted, so what we want to know is the length of the longest run. When there are no repeats, the longest run length will be 0 (for an empty string) or 1 (for a non-empty string). Otherwise, it'll be 2 or more.

编辑:好的,在“频率”方面,我会稍微解决一下问题。我仍然假设字符串已排序,因此我们想知道最长运行的长度。当没有重复时,最长的运行长度将为 0(对于空字符串)或 1(对于非空字符串)。否则,它将是 2 个或更多。

First a string-specific version:

首先是特定于字符串的版本:

public static int LongestRun(string text)
{
    if (text.Length == 0)
    {
        return 0;
    }
    char current = text[0];
    int currentRun = 1;
    int bestRun = 0;

    for (int i=1; i < text.Length; i++)
    {
        if (current != text[i])
        {
            bestRun = Math.Max(currentRun, bestRun);
            currentRun = 0;
            current = text[i];
        }
        currentRun++;
    }
    // It's possible that the final run is the best one
    return Math.Max(currentRun, bestRun);
}

Now we can also do this as a general extension method on IEnumerable<T>:

现在我们也可以将其作为通用扩展方法来执行IEnumerable<T>

public static int LongestRun(this IEnumerable<T> source)
{
    bool first = true;
    T current = default(T);
    int currentRun = 0;
    int bestRun = 0;

    foreach (T element in source)
    {
        if (first || !EqualityComparer<T>.Default(element, current))
        {
            first = false;
            bestRun = Math.Max(currentRun, bestRun);
            currentRun = 0;
            current = element;
        }
    }
    // It's possible that the final run is the best one
    return Math.Max(currentRun, bestRun);
}

Then you can call "AABCD".LongestRun()for example.

然后你可以打电话"AABCD".LongestRun()

回答by dirkgently

UpdateNow, you'd need an array of counters to maintain a count.

现在更新,您需要一组计数器来维护计数。

Keep a bit array, with one bit representing a unique character. Turn the bit on when you encounter a character, and run over the string once. A mapping of the bit array index and the character set is upto you to decide. Break if you see that a particular bit is on already.

保留一个位数组,一位代表一个唯一的字符。遇到字符时打开该位,并在字符串上运行一次。位数组索引和字符集的映射由您决定。如果您看到某个特定位已打开,请中断。

回答by mqp

If the string is sorted, you could just remember each character in turn and check to make sure the next character is never identical to the last character.

如果字符串已排序,您只需依次记住每个字符并检查以确保下一个字符永远不会与最后一个字符相同。

Other than that, for strings under ten characters, just testing each character against all the rest is probably as fast or faster than most other things. A bit vector, as suggested by another commenter, may be faster (helps if you have a small set of legal characters.)

除此之外,对于 10 个字符以下的字符串,仅针对所有其他字符测试每个字符可能与大多数其他事物一样快或更快。正如另一位评论者所建议的那样,位向量可能会更快(如果您有一小部分合法字符会有所帮助。)

Bonus: here's a slick LINQ solution to implement Jon's functionality:

奖励:这里有一个灵活的 LINQ 解决方案来实现 Jon 的功能:

int longestRun =
    s.Select((c, i) => s.Substring(i).TakeWhile(x => x == c).Count()).Max();

So, OK, it's not very fast! You got a problem with that?!

所以,好吧,它不是很快!你对此有看法?!

:-)

:-)

回答by xrost

I think the easiest way to achieve that is to use this simple regex

我认为最简单的方法是使用这个简单的正则表达式

bool foundMatch = false;
foundMatch = Regex.IsMatch(yourString, @"(\w)");

If you need more information about the match (start, length etc)

如果您需要有关比赛的更多信息(开始、长度等)

        Match match = null;
    string testString = "ABCDE AABCD";
    match = Regex.Match(testString, @"(\w)+?");
    if (match.Success)
    {
        string matchText = match.Value; // AA
        int matchIndnex = match.Index;  // 6
        int matchLength = match.Length; // 2
    }

回答by Steve Jessop

/(.).*/

(or whatever the equivalent is in your regex library's syntax)

(或任何等效的正则表达式库的语法)

Not the most efficient, since it will probably backtrack to every character in the string and then scan forward again. And I don't usually advocate regular expressions. But if you want brevity...

不是最有效的,因为它可能会回溯到字符串中的每个字符,然后再次向前扫描。而且我通常不提倡正则表达式。但如果你想要简洁...

回答by Winston Smith

Since you're using 3.5, you could do this in one LINQ query:

由于您使用的是 3.5,因此您可以在一个 LINQ 查询中执行此操作:

var results = stringInput
  .ToCharArray() // not actually needed, I've left it here to show what's actually happening
  .GroupBy(c=>c)
  .Where(g=>g.Count()>1)
  .Select(g=>new {Letter=g.First(),Count=g.Count()})
;

For each character that appears more than once in the input, this will give you the character and the count of occurances.

对于在输入中多次出现的每个字符,这将为您提供字符和出现次数。

回答by BenAlabaster

This will tell you very quickly ifa string contains duplicates:

如果字符串包含重复项,这将很快告诉您:

bool containsDups = "ABCDEA".Length != s.Distinct().Count();

It just checks the number of distinct characters against the original length. If they're different, you've got duplicates...

它只是根据原始长度检查不同字符的数量。如果它们不同,则您有重复项...

Edit:I guess this doesn't take care of the frequency of dups you noted in your edit though... but some other suggestions here already take care of that, so I won't post the code as I note a number of them already give you a reasonably elegant solution. I particularly like Joe's implementation using LINQ extensions.

编辑:我想这并没有考虑您在编辑中注意到的重复频率……但是这里的其他一些建议已经解决了这个问题,所以我不会发布代码,因为我注意到其中的一些已经给你一个相当优雅的解决方案。我特别喜欢 Joe 使用 LINQ 扩展的实现。

回答by CasperT

How about something like:

怎么样:

string strString = "AA BRA KA DABRA";

var grp = from c in strString.ToCharArray() 
        group c by c into m
        select new { Key = m.Key, Count = m.Count() };

foreach (var item in grp)
{
    Console.WriteLine(
        string.Format("Character:{0} Appears {1} times", 
        item.Key.ToString(), item.Count));
}

回答by Davy Landman

When there is no order to work on you could use a dictionary to keep the counts:

当没有订单可以处理时,您可以使用字典来保持计数:

String input = "AABCD";
var result = new Dictionary<Char, int>(26);
var chars = input.ToCharArray();
foreach (var c in chars)
{
    if (!result.ContainsKey(c))
    {
        result[c] = 0; // initialize the counter in the result
    }
    result[c]++;
}

foreach (var charCombo in result)
{
    Console.WriteLine("{0}: {1}",charCombo.Key, charCombo.Value);   
}

回答by Paul U

The hash solution Jon was describing is probably the best. You could use a HybridDictionary since that works well with small and large data sets. Where the letter is the key and the value is the frequency. (Update the frequency every time the add fails or the HybridDictionary returns true for .Contains(key))

Jon 描述的哈希解决方案可能是最好的。您可以使用 HybridDictionary,因为它适用于小型和大型数据集。其中字母是键,值是频率。(每次添加失败或 HybridDictionary 为 .Contains(key) 返回 true 时更新频率)