在 C# 中避免 List<> 中重复的快速方法
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/17278593/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Fast ways to avoid duplicates in a List<> in C#
提问by Robert Strauch
My C# program generates random strings from a given pattern. These strings are stored in a list. As no duplicates are allowed I'm doing it like this:
我的 C# 程序根据给定的模式生成随机字符串。这些字符串存储在一个列表中。由于不允许重复,我这样做:
List<string> myList = new List<string>();
for (int i = 0; i < total; i++) {
string random_string = GetRandomString(pattern);
if (!myList.Contains(random_string)) myList.Add(random_string);
}
As you can imagine this works fine for several hundreds of entries. But I'm facing the situation to generate several million strings. And with each added string checking for duplicates gets slower and slower.
可以想象,这适用于数百个条目。但是我面临着生成几百万个字符串的情况。并且随着每个添加的字符串检查重复项变得越来越慢。
Are there any faster ways to avoid duplicates?
有没有更快的方法来避免重复?
采纳答案by Servy
Use a data structure that can much more efficiently determine if an item exists, namely a HashSet. It can determine if an item is in the set in constant time, regardless of the number of items in the set.
使用可以更有效地确定项目是否存在的数据结构,即 a HashSet。它可以在恒定时间内确定一个项目是否在集合中,而不管集合中的项目数量。
If you reallyneed the items in a Listinstead, or you need the items in the resulting list to be in the order they were generated, then you can store the data in both a list and a hashset; adding the item to both collections if it doesn't currently exist in the HashSet.
如果您确实需要 a 中的项目List,或者您需要结果列表中的项目按照它们生成的顺序排列,那么您可以将数据存储在列表和哈希集中;如果该项目当前不存在于HashSet.
回答by Zdravko Danev
A Hashtable would be a faster way to check if an item exists than a list.
与列表相比,哈希表是检查项目是否存在的更快方法。
回答by p.s.w.g
The easiest way is to use this:
最简单的方法是使用这个:
myList = myList.Distinct().ToList();
Although this would require creating the list once, then creating a new list. A better way might be to make your generator ahead of time:
虽然这需要创建一次列表,然后创建一个新列表。更好的方法可能是提前制作生成器:
public IEnumerable<string> GetRandomStrings(int total, string pattern)
{
for (int i = 0; i < total; i++)
{
yield return GetRandomString(pattern);
}
}
...
myList = GetRandomStrings(total, pattern).Distinct().ToList();
Of course, if you don't need to access items by index, you could probably improve efficiency even more by dropping the ToListand just using an IEnumerable.
当然,如果你不通过索引需要访问的项目,你很可能更通过降低提高效率ToList和公正的使用IEnumerable。
回答by catfood
Don't use List<>. Use Dictionary<>or HashSet<>instead!
不要使用List<>. 使用Dictionary<>或HashSet<>代替!
回答by jdehlin
Have you tried:
你有没有尝试过:
myList = myList.Distinct()
回答by DGibbs
You could use a HashSet<string>if order is not important:
您可以使用HashSet<string>if 顺序不重要:
HashSet<string> myHashSet = new HashSet<string>();
for (int i = 0; i < total; i++)
{
string random_string = GetRandomString(pattern);
myHashSet.Add(random_string);
}
The HashSet class provides high-performance set operations. A set is a collection that contains no duplicate elements, and whose elements are in no particular order.
HashSet 类提供高性能的集合操作。集合是不包含重复元素且其元素没有特定顺序的集合。
Or if the order isimportant, I'd recommend using a SortedSet(.net 4.5 only)
或者,如果顺序很重要,我建议使用SortedSet(仅限 .net 4.5)
回答by Amir Javed
not a good way but kind of quick fix, take a bool to check if in whole list there is any duplicate entry.
不是一个好方法,而是一种快速修复,使用 bool 检查整个列表中是否有任何重复条目。
bool containsKey;
string newKey;
public void addKey(string newKey){
foreach(string key in MyKeys){
if(key == newKey){
containsKey = true;
}
}
if(!containsKey){
MyKeys.add(newKey);
}else{
containsKey = false;
}
}

