C# HashSet<T>(IEqualityComparer<T>) 的查找时间复杂度是多少?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/9812020/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-09 10:42:49  来源:igfitidea点击:

What is the lookup time complexity of HashSet<T>(IEqualityComparer<T>)?

c#runtimecomplexity-theoryhashset

提问by Kirby

In C#.NET, I like using HashSets because of their supposed O(1) time complexity for lookups. If I have a large set of data that is going to be queried, I often prefer using a HashSet to a List, since it has this time complexity.

在 C#.NET 中,我喜欢使用 HashSets,因为它们的查找时间复杂度为 O(1)。如果我有大量要查询的数据,我通常更喜欢使用 HashSet 到列表,因为它具有这个时间复杂度。

What confuses me is the constructor for the HashSet, which takes IEqualityComparer as an argument:

让我困惑的是 HashSet 的构造函数,它以 IEqualityComparer 作为参数:

http://msdn.microsoft.com/en-us/library/bb359100.aspx

http://msdn.microsoft.com/en-us/library/bb359100.aspx

In the link above, the remarks note that the "constructor is an O(1) operation," but if this is the case, I am curious if lookup is still O(1).

在上面的链接中,注释指出“构造函数是 O(1) 操作”,但如果是这种情况,我很好奇查找是否仍然是 O(1)。

In particular, it seems to me that, if I were to write a Comparer to pass in to the constructor of a HashSet, whenever I perform a lookup, the Comparer code would have to be executed on every key to check to see if there was a match. This would not be O(1), but O(n).

特别是,在我看来,如果我要编写一个比较器来传递给 HashSet 的构造函数,那么每当我执行查找时,都必须在每个键上执行比较器代码以检查是否存在一场比赛。这不会是 O(1),而是 O(n)。

Does the implementation internally construct a lookup table as elements are added to the collection?

当元素被添加到集合中时,实现是否在内部构造了一个查找表?

In general, how might I ascertain information about complexity of .NET data structures?

一般而言,我如何确定有关 .NET 数据结构复杂性的信息?

采纳答案by Scott Stafford

A HashSetworks via hashing (via IEqualityComparer.GetHashCode) the objects you insert and tosses the objects into buckets per the hash. The buckets themselves are stored in an array, hence the O(1) part.

AHashSet通过散列(通过IEqualityComparer.GetHashCode)您插入的对象并根据散列将对象扔到桶中。桶本身存储在一个数组中,因此是 O(1) 部分。

For example (this is not necessarily exactly how the C# implementation works, it just gives a flavor) it takes the first character of the hash and throws everything with a hash starting with 1 into bucket 1. Hash of 2, bucket 2, and so on. Inside that bucket is another array of buckets that divvy up by the second character in the hash. So on for every character in the hash....

例如(这不一定完全是 C# 实现的工作方式,它只是提供了一种风格)它采用散列的第一个字符并将所有以 1 开头的散列放入桶 1。2 的散列,桶 2,等等在。在那个桶里面是另一个桶数组,它们被散列中的第二个字符分开。对散列中的每个字符如此......

Now, when you look something up, it hashes it, and jumps thru the appropriate buckets. It has to do several array lookups (one for each character in the hash) but does not grow as a function of N, the number of objects you've added, hence the O(1) rating.

现在,当您查找某些内容时,它会对其进行哈希处理,并跳过相应的存储桶。它必须进行多次数组查找(散列中的每个字符一个),但不会随着 N(您添加的对象数量)而增长,因此评分为 O(1)。

To your other question, here is a blog post with the complexity of a number of collections' operations: http://c-sharp-snippets.blogspot.com/2010/03/runtime-complexity-of-net-generic.html

对于您的另一个问题,这里是一篇博客文章,其中包含许多集合操作的复杂性:http: //c-sharp-snippets.blogspot.com/2010/03/runtime-complexity-of-net-generic.html

回答by sll

It would depends on quality of hash function (GetHashCode()) your IEqualityComparerimplementation provides. Ideal hash function should provide well-distributed random set of hash codes. These hash codes will be used as an index which allows mapping key to a value, so search for a value by key becomes more efficient especially when a key is a complex object/structure.

这取决于GetHashCode()您的IEqualityComparer实现提供的哈希函数 ( ) 的质量。理想的散列函数应该提供分布良好的随机散列码集。这些哈希码将用作索引,允许将键映射到值,因此通过键搜索值变得更加有效,尤其是当键是复杂的对象/结构时。

the Comparer code would have to be executed on every key to check to see if there was a match. This would not be O(1), but O(n).

必须在每个键上执行比较器代码以检查是否存在匹配。这不会是 O(1),而是 O(n)。

This is not how hashtable works, this is some kind of straightforward bruteforce search. In case of hashtable you would have more intelligent approach which uses search by index (hash code).

这不是哈希表的工作方式,这是某种直接的暴力搜索。在哈希表的情况下,您将有更智能的方法,它使用按索引(哈希码)搜索。

回答by phoog

Lookup is still O(1) if you pass an IEqualityComparer. The hash set still uses the same logic as if you don'tpass an IEqualityComparer; it just uses the IEqualityComparer's implementations of GetHashCode and Equals instead of the instance methods of System.Object (or the overrides provided by the object in question).

如果传递 IEqualityComparer,查找仍然是 O(1)。散列集仍然使用与传递 IEqualityComparer相同的逻辑;它只是使用 IEqualityComparer 的 GetHashCode 和 Equals 实现,而不是 System.Object 的实例方法(或相关对象提供的覆盖)。

回答by Eric Lippert

if I were to write a Comparer to pass in to the constructor of a HashSet, whenever I perform a lookup, the Comparer code would have to be executed on every key to check to see if there was a match. This would not be O(1), but O(n).

如果我要编写一个比较器来传递给 HashSet 的构造函数,那么每当我执行查找时,都必须在每个键上执行比较器代码以检查是否存在匹配。这不会是 O(1),而是 O(n)。

Let's call the value you are searching for the "query" value.

让我们将您正在搜索的值称为“查询”值。

Can you explain why you believe the comparer has to be executed on every key to see if it matches the query?

你能解释一下为什么你认为必须在每个键上执行比较器以查看它是否与查询匹配吗?

This belief is false. (Unless of course the hash code supplied by the comparer is the same for every key!) The search algorithm executes the equality comparer on every key whose hash codematches the query's hash code, modulo the number of buckets in the hash table. That's how hash tables get O(1) lookup time.

这种信念是错误的。(当然,除非比较器提供的哈希码对于每个键都相同!)搜索算法对哈希码与查询的哈希码匹配的每个键执行相等比较器,以哈希表中的桶数为模。这就是哈希表如何获得 O(1) 查找时间。

Does the implementation internally construct a lookup table as elements are added to the collection?

当元素被添加到集合中时,实现是否在内部构造了一个查找表?

Yes.

是的。

In general, how might I ascertain information about complexity of .NET data structures?

一般而言,我如何确定有关 .NET 数据结构复杂性的信息?

Read the documentation.

阅读文档。

回答by nikstffrs

Actually the lookup time of a HashSet<T>isn't always O(1).

实际上 a 的查找时间HashSet<T>并不总是 O(1)。

As others have already mentioned a HashSet uses IEqualityComparer<T>.GetHashCode().
Now consider a struct or object which always returns the same hash code x.

正如其他人已经提到的 HashSet 使用IEqualityComparer<T>.GetHashCode().
现在考虑一个总是返回相同哈希码的结构体或对象x

If you add n items to your HashSet there will be n items with the same hash in it (as long as the objects aren't equal).
So if you were to check if an element with the hash code xexists in your HashSet it will run equality checks for all objects with the hash code xto test wether the HashSet contains the element

如果向 HashSet 添加 n 个项目,则其中将有 n 个具有相同散列的项目(只要对象不相等)。
因此,如果您要检查具有哈希代码的元素是否x存在于 HashSet 中,它将对具有哈希代码的所有对象运行相等性检查,x以测试 HashSet 是否包含该元素