.net Object.GetHashCode() 的默认实现

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/720177/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-03 12:28:08  来源:igfitidea点击:

Default implementation for Object.GetHashCode()

.nethashgethashcode

提问by Fung

How does the default implementation for GetHashCode()work? And does it handle structures, classes, arrays, etc. efficiently and well enough?

默认实现如何GetHashCode()工作?它是否足够有效地处理结构、类、数组等?

I am trying to decide in what cases I should pack my own and in what cases I can safely rely on the default implementation to do well. I don't want to reinvent the wheel, if at all possible.

我试图决定在什么情况下我应该自己打包,在什么情况下我可以安全地依赖默认实现来做好。如果可能的话,我不想重新发明轮子。

采纳答案by David Brown

namespace System {
    public class Object {
        [MethodImpl(MethodImplOptions.InternalCall)]
        internal static extern int InternalGetHashCode(object obj);

        public virtual int GetHashCode() {
            return InternalGetHashCode(this);
        }
    }
}

InternalGetHashCodeis mapped to an ObjectNative::GetHashCodefunction in the CLR, which looks like this:

InternalGetHashCode映射到CLR 中的ObjectNative::GetHashCode函数,如下所示:

FCIMPL1(INT32, ObjectNative::GetHashCode, Object* obj) {  
    CONTRACTL  
    {  
        THROWS;  
        DISABLED(GC_NOTRIGGER);  
        INJECT_FAULT(FCThrow(kOutOfMemoryException););  
        MODE_COOPERATIVE;  
        SO_TOLERANT;  
    }  
    CONTRACTL_END;  

    VALIDATEOBJECTREF(obj);  

    DWORD idx = 0;  

    if (obj == 0)  
        return 0;  

    OBJECTREF objRef(obj);  

    HELPER_METHOD_FRAME_BEGIN_RET_1(objRef);        // Set up a frame  

    idx = GetHashCodeEx(OBJECTREFToObject(objRef));  

    HELPER_METHOD_FRAME_END();  

    return idx;  
}  
FCIMPLEND

The full implementation of GetHashCodeExis fairly large, so it's easier to just link to the C++ source code.

GetHashCodeEx的完整实现相当大,因此更容易链接到C++ 源代码

回答by Marc Gravell

For a class, the defaults are essentially reference equality, and that is usually fine. If writing a struct, it is more common to override equality (not least to avoid boxing), but it is very rare you write a struct anyway!

对于一个类,默认值本质上是引用相等,这通常很好。如果编写结构体,更常见的是覆盖相等性(尤其是为了避免装箱),但无论如何编写结构体的情况非常罕见!

When overriding equality, you should always have a matching Equals()and GetHashCode()(i.e. for two values, if Equals()returns true they mustreturn the same hash-code, but the converse is notrequired) - and it is common to also provide ==/!=operators, and often to implement IEquatable<T>too.

当重写等式,你应该始终有一个匹配的Equals()GetHashCode()(即两个值,如果Equals()返回true,他们必须返回相同的哈希码,但反过来不是必需的) -这是常见的也提供==/!=运营商,并经常到也实施IEquatable<T>

For generating the hash code, it is common to use a factored sum, as this avoids collisions on paired values - for example, for a basic 2 field hash:

为了生成哈希码,通常使用因式和,因为这可以避免配对值的冲突 - 例如,对于基本的 2 字段哈希:

unchecked // disable overflow, for the unlikely possibility that you
{         // are compiling with overflow-checking enabled
    int hash = 27;
    hash = (13 * hash) + field1.GetHashCode();
    hash = (13 * hash) + field2.GetHashCode();
    return hash;
}

This has the advantage that:

这样做的好处是:

  • the hash of {1,2} is not the same as the hash of {2,1}
  • the hash of {1,1} is not the same as the hash of {2,2}
  • {1,2} 的散列与 {2,1} 的散列不同
  • {1,1} 的哈希值与 {2,2} 的哈希值不同

etc - which can be common if just using an unweighted sum, or xor (^), etc.

etc - 如果仅使用未加权的总和或 xor ( ^)等,这可能很常见。

回答by Guffa

The documentation for the GetHashCodemethod for Objectsays "the default implementation of this method must not be used as a unique object identifier for hashing purposes."and the one for ValueTypesays "If you call the derived type's GetHashCode method, the return value is not likely to be suitable for use as a key in a hash table.".

ObjectGetHashCode方法的文档说“此方法的默认实现不得用作散列目的的唯一对象标识符。” ValueType 则表示“如果您调用派生类型的 GetHashCode 方法,则返回值不太可能适合用作哈希表中的键。” .

The basic data types like byte, short, int, long, charand stringimplement a good GetHashCode method. Some other classes and structures, like Pointfor example, implement a GetHashCodemethod that may or may not be suitable for your specific needs. You just have to try it out to see if it's good enough.

基本数据类型,例如byteshortintlongchar以及string实现良好的GetHashCode方法。其他一些类和结构,Point例如,实现的GetHashCode方法可能适合也可能不适合您的特定需求。你只需要尝试一下,看看它是否足够好。

The documentation for each class or structure can tell you if it overrides the default implementation or not. If it doesn't override it you should use your own implementation. For any classes or structs that you create yourself where you need to use the GetHashCodemethod, you should make your own implementation that uses the appropriate members to calculate the hash code.

每个类或结构的文档可以告诉您它是否覆盖了默认实现。如果它没有覆盖它,你应该使用你自己的实现。对于您自己创建的任何需要使用该GetHashCode方法的类或结构,您应该创建自己的实现,使用适当的成员来计算哈希码。

回答by geekley

Since I couldn't find an answer that explains whywe should override GetHashCodeand Equalsfor custom structs and whythe default implementation "is not likely to be suitable for use as a key in a hash table", I'll leave a link to this blog post, which explains why with a real-case example of a problem that happened.

由于我找不到解释为什么我们应该覆盖GetHashCodeEquals自定义结构以及为什么默认实现“不太可能适合用作哈希表中的键”的答案,我将留下指向此博客的链接post,通过一个发生问题的真实案例来解释原因。

I recommend reading the whole post, but here is a summary (emphasis and clarifications added).

我建议阅读整篇文章,但这里有一个摘要(添加了强调和说明)。

Reason the default hash for structs is slow and not very good:

结构的默认散列很慢并且不是很好的原因:

The way the CLR is designed, every call to a member defined in System.ValueTypeor System.Enumtypes [may] cause a boxing allocation[...]

An implementer of a hash function faces a dilemma: make a good distribution of the hash function or to make it fast. In some cases, it's possible to achieve them both, but it is hard to do this genericallyin ValueType.GetHashCode.

The canonical hash function of a struct "combines" hash codes of all the fields. But the only way to get a hash code of a field in a ValueTypemethod is to use reflection. So, the CLR authors decided to trade speed over the distribution and the default GetHashCodeversion just returns a hash code of a first non-null fieldand "munges" it with a type id [...] This is a reasonable behavior unless it's not. For instance, if you're unlucky enough and the first field of your struct has the same value for most instances, then a hash function will provide the same resultall the time. And, as you may imagine, this will cause a drastic performance impact if these instances are stored in a hash set or a hash table.

[...] Reflection-based implementation is slow. Very slow.

[...] Both ValueType.Equalsand ValueType.GetHashCodehave a special optimization. If a type does not have "pointers" and is properly packed [...] then more optimal versions are used: GetHashCodeiterates over an instance and XORs blocks of 4 bytes and Equalsmethod compares two instances using memcmp. [...] But the optimization is very tricky. First, it is hard to know when the optimization is enabled [...] Second, a memory comparison will not necessarily give you the right results. Here is a simple example: [...] -0.0and +0.0are equal but have different binary representations.

CLR 的设计方式是,每次调用System.ValueTypeSystem.Enum类型中定义的成员[可能] 会导致装箱分配[...]

散列函数的实现者面临着两难选择:对散列函数进行良好的分布还是使其快速。在某些情况下,有可能实现他们两个,但它是很难笼统做到这一点ValueType.GetHashCode

结构体的规范哈希函数“组合”了所有字段的哈希码。但是在方法中获取字段哈希码的唯一ValueType方法是使用反射。因此,CLR 作者决定在发行GetHashCode版上交换速度,默认版本只返回第一个非空字段的哈希码,并使用类型 id [...] . 例如,如果您不够幸运并且结构的第一个字段对于大多数实例具有相同的值,那么哈希函数将始终提供相同的结果。而且,正如您想象的那样,如果这些实例存储在散列集或散列表中,这将导致严重的性能影响。

[...]基于反射的实现很慢。非常慢。

[...]两者ValueType.EqualsValueType.GetHashCode有专门的优化。如果一个类型没有“指针”并且被正确打包 [...] 然后使用更优化的版本:GetHashCode迭代一个实例和 4 个字节的 XOR 块和Equals方法比较两个实例使用memcmp. [...] 但是优化非常棘手。首先,很难知道何时启用优化 [...] 其次,内存比较不一定会给您正确的结果。这是一个简单的例子: [...]-0.0+0.0是相等的,但有不同的二进制表示。

Real-world issue described in the post:

帖子中描述的现实问题:

private readonly HashSet<(ErrorLocation, int)> _locationsWithHitCount;
readonly struct ErrorLocation
{
    // Empty almost all the time
    public string OptionalDescription { get; }
    public string Path { get; }
    public int Position { get; }
}

We used a tuple that contained a custom struct with default equality implementation. And unfortunately, the struct had an optional first field that was almost always equals to [empty string]. The performance was OK until the number of elements in the set increased significantly causing a real performance issue, taking minutes to initialize a collection with tens of thousands of items.

我们使用了一个包含具有默认相等实现的自定义结构的元组。而不幸的是,该结构有一个可选的第一场,这是几乎总是等于[空字符串]。性能还可以,直到集合中的元素数量显着增加,导致真正的性能问题,初始化包含数万个项目的集合需要几分钟时间。

So, to answer the question "in what cases I should pack my own and in what cases I can safely rely on the default implementation", at least in the case of structs, you should override Equalsand GetHashCodewhenever your custom struct might be used as a key in a hash table or Dictionary.
I would also recommend implementing IEquatable<T>in this case, to avoid boxing.

因此,要回答“在什么情况下我应该打包我自己的以及在什么情况下我可以安全地依赖默认实现”的问题,至少在structs的情况下,您应该覆盖Equals并且GetHashCode每当您的自定义结构可能用作哈希表中的键或Dictionary.
我还建议IEquatable<T>在这种情况下实施,以避免拳击。

As the other answers said, if you're writing a class, the default hash using reference equality is usually fine, so I wouldn't bother in this case, unlessyou need to override Equals(then you would have to override GetHashCodeaccordingly).

正如其他答案所说,如果您正在编写class,则使用引用相等的默认哈希通常很好,因此在这种情况下我不会打扰,除非您需要覆盖Equals(然后您必须相应地覆盖GetHashCode)。

回答by Bennett Dill

Generally speaking, if you're overriding Equals, you want to override GetHashCode. The reason for this is because both are used to compare equality of your class/struct.

一般来说,如果您要覆盖 Equals,则需要覆盖 GetHashCode。这样做的原因是因为两者都用于比较类/结构的相等性。

Equals is used when checking Foo A, B;

检查 Foo A, B 时使用 Equals;

if (A == B)

如果 (A == B)

Since we know the pointer isn't likely to match, we can compare the internal members.

由于我们知道指针不太可能匹配,我们可以比较内部成员。

Equals(obj o)
{
    if (o == null) return false;
    MyType Foo = o as MyType;
    if (Foo == null) return false;
    if (Foo.Prop1 != this.Prop1) return false;

    return Foo.Prop2 == this.Prop2;
}

GetHashCode is generally used by hash tables. The hashcode generated by your class should always be the same for a classes give state.

GetHashCode 一般用于哈希表。对于类给出状态,您的类生成的哈希码应该始终相同。

I typically do,

我通常这样做,

GetHashCode()
{
    int HashCode = this.GetType().ToString().GetHashCode();
    HashCode ^= this.Prop1.GetHashCode();
    etc.

    return HashCode;
}

Some will say that the hashcode should only be calculated once per object lifetime, but I don't agree with that (and I'm probably wrong).

有人会说每个对象生命周期应该只计算一次哈希码,但我不同意这一点(我可能错了)。

Using the default implementation provided by object, unless you have the same reference to one of your classes, they will not be equal to each other. By overriding Equals and GetHashCode, you can report equality based on internal values rather than the objects reference.

使用 object 提供的默认实现,除非您对其中一个类具有相同的引用,否则它们将不相等。通过覆盖 Equals 和 GetHashCode,您可以根据内部值而不是对象引用报告相等性。

回答by Daniel Marshall

If you're just dealing with POCOs you can use this utility to simplify your life somewhat:

如果您只是处理 POCO,则可以使用此实用程序来稍微简化您的生活:

var hash = HashCodeUtil.GetHashCode(
           poco.Field1,
           poco.Field2,
           ...,
           poco.FieldN);

...

...

public static class HashCodeUtil
{
    public static int GetHashCode(params object[] objects)
    {
        int hash = 13;

        foreach (var obj in objects)
        {
            hash = (hash * 7) + (!ReferenceEquals(null, obj) ? obj.GetHashCode() : 0);
        }

        return hash;
    }
}