C# 迭代 HashSet 的最快/最安全的方法是什么?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/9625270/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
What is the fastest/safest method to iterate over a HashSet?
提问by Mythics
I'm still quite new to C#, but noticed the advantages through forum postings of using a HashSetinstead of a Listin specific cases.
我对 C# 还是很陌生,但是通过论坛帖子注意到在特定情况下使用 aHashSet而不是 a的优势List。
My current case isn't that I'm storing a tremendous amount of data in a single Listexectly, but rather than I'm having to check for members of it often.
我目前的情况不是我将大量数据存储在一个单独的文件中List,而是我必须经常检查它的成员。
The catch is that I do indeed need to iterate over it as well, but the order they are stored or retrieved doesn't actually matter.
问题是我确实也需要对其进行迭代,但是它们的存储或检索顺序实际上并不重要。
I've read that for each loops are actually slower than for next, so how else could I go about this in the fastest method possible?
我读过每个循环实际上比下一个循环慢,那么我还能如何以最快的方法解决这个问题?
The number of .Contains()checks I'm doing is definitely hurting my performance with lists, so at least comparing to the performance of a HashSetwould be handy.
.Contains()我正在做的检查数量肯定会损害我的列表性能,因此至少与 a 的性能进行比较HashSet会很方便。
Edit: I'm currently using lists, iterating through them in numerous locations, and different code is being executed in each location. Most often, the current lists contain point coordinates that I then use to refer to a 2 dimensional array for that I then do some operation or another based on the criteria of the list.
编辑:我目前正在使用列表,在多个位置迭代它们,并且在每个位置执行不同的代码。大多数情况下,当前列表包含点坐标,然后我用它来引用二维数组,然后我根据列表的标准执行一些操作或其他操作。
If there's not a direct answer to my question, that's fine, but I assumed there might be other methods of iterating over a HashSetthan just foreachcycle. I'm currently in the dark as to what other methods there might even be, what advantages they provide, etc. Assuming there are other methods, I also made the assumption that there would be a typical preferred method of choice that is only ignored when it doesn't suite the needs (my needs are pretty basic).
如果我的问题没有直接答案,那很好,但我认为可能还有其他迭代方法而HashSet不是foreach循环。我目前对可能有哪些其他方法、它们提供什么优势等一无所知。假设还有其他方法,我还假设会有一个典型的首选方法,只有在以下情况下才会被忽略它不适合需求(我的需求非常基本)。
As far as prematurely optimizing, I already know using the lists as I am is a bottleneck. How to go about helping this issue is where I'm getting stuck. Not even stuck exactly, but I didn't want to re-invent the wheel by testing repeatedly only to find out I'm already doing it the best way I could (this is a large project with over 3 months invested, lists are everywhere, but there are definitely ones that I do not want duplicates, have a lot of data, need not be stored in any specific order, etc).
至于过早优化,我已经知道使用列表是一个瓶颈。如何着手解决这个问题是我陷入困境的地方。甚至没有完全卡住,但我不想通过反复测试来重新发明轮子只是为了发现我已经在尽我所能做到这一点(这是一个投资超过 3 个月的大型项目,清单无处不在,但肯定有一些我不想重复,有大量数据,不需要以任何特定顺序存储等)。
采纳答案by Jason Hernandez
A foreach loop has a small amount of addition overhead on an indexed collections (like an array). This is mostly because the foreach does a little more bounds checking than a for loop.
foreach 循环在索引集合(如数组)上有少量的额外开销。这主要是因为 foreach 比 for 循环做了更多的边界检查。
HashSet does not have an indexer so you have to use the enumerator.
HashSet 没有索引器,因此您必须使用枚举器。
In this case foreach is efficient as it only calls MoveNext() as it moves through the collection.
在这种情况下,foreach 是有效的,因为它只在遍历集合时调用 MoveNext()。
Also Parallel.ForEach can dramatically improve your performance, depending on the work you are doing in the loop and the size of your HashSet.
Parallel.ForEach 还可以显着提高您的性能,具体取决于您在循环中所做的工作和 HashSet 的大小。
As mentioned before profiling is your best bet.
如前所述,分析是您最好的选择。
回答by Servy
You shouldn't be iterating over a hashset in the first place to determine if an item is in it. You should use the HashSet (not the LINQ) contains method. The HashSet is designed such that it won't need to look through every item to see if any given value is inside of the set. That is what makes it so powerful for searching over a List.
您不应该首先遍历哈希集以确定其中是否包含某个项目。您应该使用 HashSet(而不是 LINQ)包含方法。HashSet 的设计使其不需要查看每个项目以查看是否有任何给定值在集合内。这就是搜索列表如此强大的原因。
回答by Wolfzoon
Not strictly answering the question in the header, but more concerning your specific problem:
不是严格回答标题中的问题,而是更多关于您的具体问题:
I would make your own Collectionobject that uses both a HashSetand a Listinternally. Iterating is fast as you can use the List, checking for Containsis fast as you can use the HashSet. Just make it an IEnumerableand you can use this Collection in foreachas well.
我会制作自己的Collection对象,HashSet在List内部同时使用 a和 a 。迭代很快,因为你可以使用 List,检查Contains也很快,因为你可以使用 HashSet。只要让它成为一个IEnumerable,你也可以使用这个集合foreach。
The downside is more memory, but there are only twice as many references to object, not twice as many objects. Worst case scenario it's only twice as much memory, but you seem much more concerned with performance.
缺点是更多的内存,但对对象的引用只有两倍,而不是对象的两倍。最坏的情况是内存只有两倍,但您似乎更关心性能。
Adding, checking, and iterating are fast this way, only removal is still O(N) because of the List.
以这种方式添加、检查和迭代速度很快,由于List.
EDIT: If removal needs to be O(1) as well, use a doubly linked listinstead of a regular list, and make the hashSet a Dictionary<KeyType, Cell>instead. You can check the dictionary for Contains, but also to find the cell with the data in it fast, so removal from the data structure is fast.
编辑:如果删除也需要 O(1),请使用双向链表而不是常规列表,并将 hashSet 改为 a Dictionary<KeyType, Cell>。您可以检查包含的字典,还可以快速找到包含数据的单元格,因此从数据结构中删除速度很快。
回答by qnaninf
I had the same issue, where the HashSet suits very well the addition of unique elements, but is very slow when getting elements in a for loop. I solved it by converting the HashSet to array and then running the for over it.
我有同样的问题,HashSet 非常适合添加唯一元素,但在 for 循环中获取元素时非常慢。我通过将 HashSet 转换为数组然后运行 for 来解决它。

