在 C# 中避免 List<> 中重复的快速方法

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/17278593/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-10 09:00:41  来源:igfitidea点击:

Fast ways to avoid duplicates in a List<> in C#

c#listduplicates

提问by Robert Strauch

My C# program generates random strings from a given pattern. These strings are stored in a list. As no duplicates are allowed I'm doing it like this:

我的 C# 程序根据给定的模式生成随机字符串。这些字符串存储在一个列表中。由于不允许重复,我这样做:

List<string> myList = new List<string>();
for (int i = 0; i < total; i++) {
  string random_string = GetRandomString(pattern);
  if (!myList.Contains(random_string)) myList.Add(random_string);
}

As you can imagine this works fine for several hundreds of entries. But I'm facing the situation to generate several million strings. And with each added string checking for duplicates gets slower and slower.

可以想象,这适用于数百个条目。但是我面临着生成几百万个字符串的情况。并且随着每个添加的字符串检查重复项变得越来越慢。

Are there any faster ways to avoid duplicates?

有没有更快的方法来避免重复?

采纳答案by Servy

Use a data structure that can much more efficiently determine if an item exists, namely a HashSet. It can determine if an item is in the set in constant time, regardless of the number of items in the set.

使用可以更有效地确定项目是否存在的数据结构,即 a HashSet。它可以在恒定时间内确定一个项目是否在集合中,而不管集合中的项目数量。

If you reallyneed the items in a Listinstead, or you need the items in the resulting list to be in the order they were generated, then you can store the data in both a list and a hashset; adding the item to both collections if it doesn't currently exist in the HashSet.

如果您确实需要 a 中的项目List,或者您需要结果列表中的项目按照它们生成的顺序排列,那么您可以将数据存储在列表和哈希集中;如果该项目当前不存在于HashSet.

回答by Zdravko Danev

A Hashtable would be a faster way to check if an item exists than a list.

与列表相比,哈希表是检查项目是否存在的更快方法。

回答by p.s.w.g

The easiest way is to use this:

最简单的方法是使用这个:

myList = myList.Distinct().ToList();

Although this would require creating the list once, then creating a new list. A better way might be to make your generator ahead of time:

虽然这需要创建一次列表,然后创建一个新列表。更好的方法可能是提前制作生成器:

public IEnumerable<string> GetRandomStrings(int total, string pattern)
{
    for (int i = 0; i < total; i++) 
    {
        yield return GetRandomString(pattern);
    }
}

...

myList = GetRandomStrings(total, pattern).Distinct().ToList();

Of course, if you don't need to access items by index, you could probably improve efficiency even more by dropping the ToListand just using an IEnumerable.

当然,如果你不通过索引需要访问的项目,你很可能更通过降低提高效率ToList和公正的使用IEnumerable

回答by catfood

Don't use List<>. Use Dictionary<>or HashSet<>instead!

不要使用List<>. 使用Dictionary<>HashSet<>代替!

回答by jdehlin

Have you tried:

你有没有尝试过:

myList = myList.Distinct()

回答by DGibbs

You could use a HashSet<string>if order is not important:

您可以使用HashSet<string>if 顺序不重要:

HashSet<string> myHashSet = new HashSet<string>();
for (int i = 0; i < total; i++) 
{
   string random_string = GetRandomString(pattern);
   myHashSet.Add(random_string);
}

The HashSet class provides high-performance set operations. A set is a collection that contains no duplicate elements, and whose elements are in no particular order.

HashSet 类提供高性能的集合操作。集合是不包含重复元素且其元素没有特定顺序的集合。

MSDN

MSDN

Or if the order isimportant, I'd recommend using a SortedSet(.net 4.5 only)

或者,如果顺序重要,我建议使用SortedSet(仅限 .net 4.5)

回答by Amir Javed

not a good way but kind of quick fix, take a bool to check if in whole list there is any duplicate entry.

不是一个好方法,而是一种快速修复,使用 bool 检查整个列表中是否有任何重复条目。

bool containsKey;
string newKey;

    public void addKey(string newKey){

         foreach(string key in MyKeys){
           if(key == newKey){
             containsKey = true;
          }
         }

      if(!containsKey){
       MyKeys.add(newKey);
     }else{
       containsKey = false;
     }

    }