.net 覆盖 GetHashCode 的最佳算法是什么?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/263400/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
What is the best algorithm for overriding GetHashCode?
提问by bitbonk
In .NET, the GetHashCodemethodis used in a lot of places throughout the .NET base class libraries. Implementing it properly is especially important to find items quickly in a collection or when determining equality.
在 .NET 中,该GetHashCode方法在 .NET 基类库中的很多地方都使用。正确实施它对于在集合中快速查找项目或在确定相等性时尤为重要。
Is there a standard algorithm or best practice on how to implement GetHashCodefor my custom classes so I don't degrade performance?
是否有关于如何GetHashCode为我的自定义类实现的标准算法或最佳实践,这样我就不会降低性能?
回答by Jon Skeet
I usually go with something like the implementation given in Josh Bloch's fabulousEffective Java. It's fast and creates a pretty good hash which is unlikely to cause collisions. Pick two different prime numbers, e.g. 17 and 23, and do:
我通常会采用类似于 Josh Bloch出色的Effective Java 中给出的实现。它很快并且创建了一个不太可能导致冲突的非常好的散列。选择两个不同的素数,例如 17 和 23,然后执行:
public override int GetHashCode()
{
unchecked // Overflow is fine, just wrap
{
int hash = 17;
// Suitable nullity checks etc, of course :)
hash = hash * 23 + field1.GetHashCode();
hash = hash * 23 + field2.GetHashCode();
hash = hash * 23 + field3.GetHashCode();
return hash;
}
}
As noted in comments, you may find it's better to pick a large prime to multiply by instead. Apparently 486187739 is good... and although most examples I've seen with small numbers tend to use primes, there are at least similar algorithms where non-prime numbers are often used. In the not-quite-FNVexample later, for example, I've used numbers which apparently work well - but the initial value isn't a prime. (The multiplication constant isprime though. I don't know quite how important that is.)
如评论中所述,您可能会发现最好选择一个较大的素数进行乘法。显然 486187739 是好的......虽然我见过的大多数例子都倾向于使用质数,但至少有类似的算法经常使用非质数。例如,在稍后的不完全FNV示例中,我使用了显然效果很好的数字 - 但初始值不是素数。(不过,乘法常数是素数。我不太清楚它有多重要。)
This is better than the common practice of XORing hashcodes for two main reasons. Suppose we have a type with two intfields:
XOR由于两个主要原因,这比使用哈希码的常见做法要好。假设我们有一个包含两个int字段的类型:
XorHash(x, x) == XorHash(y, y) == 0 for all x, y
XorHash(x, y) == XorHash(y, x) for all x, y
By the way, the earlier algorithm is the one currently used by the C# compiler for anonymous types.
顺便说一下,较早的算法是 C# 编译器当前用于匿名类型的算法。
This pagegives quite a few options. I think for most cases the above is "good enough" and it's incredibly easy to remember and get right. The FNValternative is similarly simple, but uses different constants and XORinstead of ADDas a combining operation. It looks somethinglike the code below, but the normal FNV algorithm operates on individual bytes, so this would require modifying to perform one iteration per byte, instead of per 32-bit hash value. FNV is also designed for variable lengths of data, whereas the way we're using it here is always for the same number of field values. Comments on this answer suggest that the code here doesn't actually work as well (in the sample case tested) as the addition approach above.
此页面提供了很多选项。我认为在大多数情况下,上述内容“足够好”,而且非常容易记住和正确。所述FNV替代方案是同样简单,但使用不同的常数和XOR代替ADD作为组合操作。它看起来的东西像下面的代码,但正常的FNV算法对每个字节进行操作,所以这将需要修改来执行的,而不是每32位的哈希值每字节一个迭代。FNV 还设计用于可变长度的数据,而我们在这里使用它的方式始终是针对相同数量的字段值。对此答案的评论表明,此处的代码实际上并不像上面的加法方法那样有效(在测试的示例案例中)。
// Note: Not quite FNV!
public override int GetHashCode()
{
unchecked // Overflow is fine, just wrap
{
int hash = (int) 2166136261;
// Suitable nullity checks etc, of course :)
hash = (hash * 16777619) ^ field1.GetHashCode();
hash = (hash * 16777619) ^ field2.GetHashCode();
hash = (hash * 16777619) ^ field3.GetHashCode();
return hash;
}
}
Note that one thing to be aware of is that ideally you should prevent your equality-sensitive (and thus hashcode-sensitive) state from changing after adding it to a collection that depends on the hash code.
请注意,需要注意的一件事是,理想情况下,您应该防止在将等式敏感(因此哈希码敏感)状态添加到依赖于哈希码的集合后发生更改。
As per the documentation:
根据文档:
You can override GetHashCode for immutable reference types. In general, for mutable reference types, you should override GetHashCode only if:
- You can compute the hash code from fields that are not mutable; or
- You can ensure that the hash code of a mutable object does not change while the object is contained in a collection that relies on its hash code.
您可以为不可变引用类型覆盖 GetHashCode。通常,对于可变引用类型,您应该仅在以下情况下覆盖 GetHashCode:
- 您可以从不可变的字段计算哈希码;或者
- 您可以确保可变对象的哈希码在对象包含在依赖其哈希码的集合中时不会更改。
回答by Rick Love
Anonymous Type
匿名类型
Microsoft already provides a good generic HashCode generator: Just copy your property/field values to an anonymous type and hash it:
Microsoft 已经提供了一个很好的通用 HashCode 生成器:只需将您的属性/字段值复制到匿名类型并对其进行散列:
new { PropA, PropB, PropC, PropD }.GetHashCode();
This will work for any number of properties. It does not use boxing. It just uses the algorithm already implemented in the framework for anonymous types.
这适用于任意数量的属性。它不使用拳击。它只是使用框架中已经为匿名类型实现的算法。
ValueTuple - Update for C# 7
ValueTuple - C# 7 更新
As @cactuaroid mentions in the comments, a value tuple can be used. This saves a few keystrokes and more importantly executes purely on the stack (no Garbage):
正如@cactuaroid 在评论中提到的,可以使用值元组。这节省了一些击键,更重要的是纯粹在堆栈上执行(无垃圾):
(PropA, PropB, PropC, PropD).GetHashCode();
(Note: The original technique using anonymous types seems to create an object on the heap, i.e. garbage, since anonymous types are implemented as classes, though this might be optimized out by the compiler. It would be interesting to benchmark these options, but the tuple option should be superior.)
(注意:使用匿名类型的原始技术似乎在堆上创建一个对象,即垃圾,因为匿名类型是作为类实现的,尽管这可能会被编译器优化。对这些选项进行基准测试会很有趣,但是元组选项应该更好。)
回答by nightcoder
Here is my hashcode helper.
It's advantage is that it uses generic type arguments and therefore will not cause boxing:
这是我的哈希码助手。
它的优点是它使用泛型类型参数,因此不会导致装箱:
public static class HashHelper
{
public static int GetHashCode<T1, T2>(T1 arg1, T2 arg2)
{
unchecked
{
return 31 * arg1.GetHashCode() + arg2.GetHashCode();
}
}
public static int GetHashCode<T1, T2, T3>(T1 arg1, T2 arg2, T3 arg3)
{
unchecked
{
int hash = arg1.GetHashCode();
hash = 31 * hash + arg2.GetHashCode();
return 31 * hash + arg3.GetHashCode();
}
}
public static int GetHashCode<T1, T2, T3, T4>(T1 arg1, T2 arg2, T3 arg3,
T4 arg4)
{
unchecked
{
int hash = arg1.GetHashCode();
hash = 31 * hash + arg2.GetHashCode();
hash = 31 * hash + arg3.GetHashCode();
return 31 * hash + arg4.GetHashCode();
}
}
public static int GetHashCode<T>(T[] list)
{
unchecked
{
int hash = 0;
foreach (var item in list)
{
hash = 31 * hash + item.GetHashCode();
}
return hash;
}
}
public static int GetHashCode<T>(IEnumerable<T> list)
{
unchecked
{
int hash = 0;
foreach (var item in list)
{
hash = 31 * hash + item.GetHashCode();
}
return hash;
}
}
/// <summary>
/// Gets a hashcode for a collection for that the order of items
/// does not matter.
/// So {1, 2, 3} and {3, 2, 1} will get same hash code.
/// </summary>
public static int GetHashCodeForOrderNoMatterCollection<T>(
IEnumerable<T> list)
{
unchecked
{
int hash = 0;
int count = 0;
foreach (var item in list)
{
hash += item.GetHashCode();
count++;
}
return 31 * hash + count.GetHashCode();
}
}
/// <summary>
/// Alternative way to get a hashcode is to use a fluent
/// interface like this:<br />
/// return 0.CombineHashCode(field1).CombineHashCode(field2).
/// CombineHashCode(field3);
/// </summary>
public static int CombineHashCode<T>(this int hashCode, T arg)
{
unchecked
{
return 31 * hashCode + arg.GetHashCode();
}
}
Also it has extension method to provide a fluent interface, so you can use it like this:
它还具有扩展方法来提供流畅的界面,因此您可以像这样使用它:
public override int GetHashCode()
{
return HashHelper.GetHashCode(Manufacturer, PartN, Quantity);
}
or like this:
或者像这样:
public override int GetHashCode()
{
return 0.CombineHashCode(Manufacturer)
.CombineHashCode(PartN)
.CombineHashCode(Quantity);
}
回答by Wahid Shalaly
I have a Hashing class in Helper library that I use it for this purpose.
我在 Helper 库中有一个 Hashing 类,我将它用于此目的。
/// <summary>
/// This is a simple hashing function from Robert Sedgwicks Hashing in C book.
/// Also, some simple optimizations to the algorithm in order to speed up
/// its hashing process have been added. from: www.partow.net
/// </summary>
/// <param name="input">array of objects, parameters combination that you need
/// to get a unique hash code for them</param>
/// <returns>Hash code</returns>
public static int RSHash(params object[] input)
{
const int b = 378551;
int a = 63689;
int hash = 0;
// If it overflows then just wrap around
unchecked
{
for (int i = 0; i < input.Length; i++)
{
if (input[i] != null)
{
hash = hash * a + input[i].GetHashCode();
a = a * b;
}
}
}
return hash;
}
Then, simply you can use it as:
然后,您只需将其用作:
public override int GetHashCode()
{
return Hashing.RSHash(_field1, _field2, _field3);
}
I didn't assess its performance, so any feedback is welcomed.
我没有评估它的性能,所以欢迎任何反馈。
回答by ?afak Gür
Here's my helper class using Jon Skeet's implementation.
这是我使用Jon Skeet 实现的助手类。
public static class HashCode
{
public const int Start = 17;
public static int Hash<T>(this int hash, T obj)
{
var h = EqualityComparer<T>.Default.GetHashCode(obj);
return unchecked((hash * 31) + h);
}
}
Usage:
用法:
public override int GetHashCode()
{
return HashCode.Start
.Hash(_field1)
.Hash(_field2)
.Hash(_field3);
}
If you want to avoid writing an extension method for System.Int32:
如果您想避免为 System.Int32 编写扩展方法:
public readonly struct HashCode
{
private readonly int _value;
public HashCode(int value) => _value = value;
public static HashCode Start { get; } = new HashCode(17);
public static implicit operator int(HashCode hash) => hash._value;
public HashCode Hash<T>(T obj)
{
var h = EqualityComparer<T>.Default.GetHashCode(obj);
return unchecked(new HashCode((_value * 31) + h));
}
public override int GetHashCode() => _value;
}
It still avoids any heap allocation and is used exactly the same way:
它仍然避免任何堆分配,并且使用方式完全相同:
public override int GetHashCode()
{
// This time `HashCode.Start` is not an `Int32`, it's a `HashCode` instance.
// And the result is implicitly converted to `Int32`.
return HashCode.Start
.Hash(_field1)
.Hash(_field2)
.Hash(_field3);
}
Edit (May 2018): EqualityComparer<T>.Defaultgetter is now a JIT intrinsic - the pull requestis mentioned by Stephen Toub in this blog post.
编辑(2018 年 5 月):EqualityComparer<T>.Defaultgetter 现在是 JIT 内在的 - Stephen Toub 在这篇博文中提到了拉取请求。
回答by Muhammad Rehan Saeed
.NET Standard 2.1 And Above
.NET 标准 2.1 及以上
If you are using .NET Standard 2.1 or above, you can use the System.HashCodestruct. There are two methods of using it:
如果您使用 .NET Standard 2.1 或更高版本,则可以使用System.HashCode结构。有两种使用方法:
HashCode.Combine
HashCode.Combine
The Combinemethod can be used to create a hash code, given up to eight objects.
该Combine方法可用于创建一个哈希码,最多给出八个对象。
public override int GetHashCode() => HashCode.Combine(this.object1, this.object2);
HashCode.Add
HashCode.Add
The Addmethod helps you to deal with collections:
该Add方法可以帮助您处理集合:
public override int GetHashCode()
{
var hashCode = new HashCode();
hashCode.Add(this.object1);
foreach (var item in this.collection)
{
hashCode.Add(item);
}
return hashCode.ToHashCode();
}
GetHashCode Made Easy
GetHashCode 变得简单
You can read the full blog post 'GetHashCode Made Easy' for more details and comments.
您可以阅读完整的博客文章“ GetHashCode Made Easy”以获取更多详细信息和评论。
Usage Example
使用示例
public class SuperHero
{
public int Age { get; set; }
public string Name { get; set; }
public List<string> Powers { get; set; }
public override int GetHashCode() =>
HashCode.Of(this.Name).And(this.Age).AndEach(this.Powers);
}
Implementation
执行
public struct HashCode : IEquatable<HashCode>
{
private const int EmptyCollectionPrimeNumber = 19;
private readonly int value;
private HashCode(int value) => this.value = value;
public static implicit operator int(HashCode hashCode) => hashCode.value;
public static bool operator ==(HashCode left, HashCode right) => left.Equals(right);
public static bool operator !=(HashCode left, HashCode right) => !(left == right);
public static HashCode Of<T>(T item) => new HashCode(GetHashCode(item));
public static HashCode OfEach<T>(IEnumerable<T> items) =>
items == null ? new HashCode(0) : new HashCode(GetHashCode(items, 0));
public HashCode And<T>(T item) =>
new HashCode(CombineHashCodes(this.value, GetHashCode(item)));
public HashCode AndEach<T>(IEnumerable<T> items)
{
if (items == null)
{
return new HashCode(this.value);
}
return new HashCode(GetHashCode(items, this.value));
}
public bool Equals(HashCode other) => this.value.Equals(other.value);
public override bool Equals(object obj)
{
if (obj is HashCode)
{
return this.Equals((HashCode)obj);
}
return false;
}
public override int GetHashCode() => this.value.GetHashCode();
private static int CombineHashCodes(int h1, int h2)
{
unchecked
{
// Code copied from System.Tuple a good way to combine hashes.
return ((h1 << 5) + h1) ^ h2;
}
}
private static int GetHashCode<T>(T item) => item?.GetHashCode() ?? 0;
private static int GetHashCode<T>(IEnumerable<T> items, int startHashCode)
{
var temp = startHashCode;
var enumerator = items.GetEnumerator();
if (enumerator.MoveNext())
{
temp = CombineHashCodes(temp, GetHashCode(enumerator.Current));
while (enumerator.MoveNext())
{
temp = CombineHashCodes(temp, GetHashCode(enumerator.Current));
}
}
else
{
temp = CombineHashCodes(temp, EmptyCollectionPrimeNumber);
}
return temp;
}
}
What Makes a Good Algorithm?
什么是好的算法?
Speed
速度
The algorithm that calculates a hash code needs to be fast. A simple algorithm is usually going to be a faster one.
计算哈希码的算法需要很快。一个简单的算法通常会更快。
Deterministic
确定性的
The hashing algorithm needs to be deterministici.e. given the same input it must always produce the same output.
散列算法需要是确定性的,即给定相同的输入,它必须始终产生相同的输出。
Reduce Collisions
减少碰撞
The algorithm that calculates a hash code needs to keep hash collisionsto a minumum. A hash collision is a situation that occurs when two calls to GetHashCodeon two different objects produce identical hash codes. Note that collisions are allowed (some have the misconceptions that they are not) but they should be kept to a minimum.
计算哈希码的算法需要将哈希冲突保持在最低限度。散列冲突是当GetHashCode对两个不同对象的两次调用产生相同散列代码时发生的情况。请注意,碰撞是允许的(有些人误解了它们不是),但它们应该保持在最低限度。
A good hash function should map the expected inputs as evenly as possible over its output range. It should have uniformity.
一个好的散列函数应该在其输出范围内尽可能均匀地映射预期的输入。它应该具有统一性。
Prevent's DoS
防止 DoS
In .NET Core each time you restart an application you will get different hash codes. This is a security feature to prevent Denial of Service attacks (DoS). For .NET Framework you shouldenable this feature by adding the following App.config file:
在 .NET Core 中,每次重新启动应用程序时,您都会获得不同的哈希码。这是一项防止拒绝服务攻击 (DoS) 的安全功能。对于 .NET Framework,您应该通过添加以下 App.config 文件来启用此功能:
<?xml version ="1.0"?>
<configuration>
<runtime>
<UseRandomizedStringHashAlgorithm enabled="1" />
</runtime>
</configuration>
Because of this feature, hash codes should never be used outside of the application domain in which they were created, they should never be used as key fields in a collection and they should never be persisted.
由于此功能,哈希码不应在创建它们的应用程序域之外使用,不应用作集合中的关键字段,也不应持久化。
Read more about this here.
在此处阅读更多相关信息。
Cryptographically Secure?
加密安全?
The algorithm does not have to be a Cryptographic hash function. Meaning it does not have to satisfy the following conditions:
该算法不必是加密散列函数。这意味着它不必满足以下条件:
- It is infeasible to generate a message that yields a given hash value
- It is infeasible to find two different messages with the same hash value
- A small change to a message should change the hash value so extensively that the new hash value appears uncorrelated with the old hash value (avalanche effect).
- 生成产生给定哈希值的消息是不可行的
- 找到两个具有相同哈希值的不同消息是不可行的
- 对消息的微小更改应该会如此广泛地更改散列值,以致新散列值看起来与旧散列值无关(雪崩效应)。
回答by Bert Huijben
In most cases where Equals() compares multiple fields it doesn't really matter if your GetHash() hashes on one field or on many. You just have to make sure that calculating the hash is really cheap (No allocations, please) and fast (No heavy computationsand certainly no database connections) and provides a good distribution.
在 Equals() 比较多个字段的大多数情况下,您的 GetHash() 是在一个字段上散列还是在多个字段上散列并不重要。您只需要确保计算散列非常便宜(请不要分配)和快速(没有繁重的计算,当然也没有数据库连接)并提供良好的分布。
The heavy lifting should be part of the Equals() method; the hash should be a very cheap operation to enable calling Equals() on as few items as possible.
繁重的工作应该是 Equals() 方法的一部分;散列应该是一个非常便宜的操作,可以在尽可能少的项目上调用 Equals()。
And one final tip: Don't rely on GetHashCode() being stable over multiple aplication runs. Many .Net types don't guarantee their hash codes to stay the same after a restart, so you should only use the value of GetHashCode() for in memory data structures.
最后一个提示:不要依赖 GetHashCode() 在多次应用程序运行中保持稳定。许多 .Net 类型不保证它们的哈希码在重新启动后保持不变,因此您应该只将 GetHashCode() 的值用于内存数据结构。
回答by Jon Hanna
Up until recently my answer would have been very close to Jon Skeet's here. However, I recently started a project which used power-of-two hash tables, that is hash tables where the size of the internal table is 8, 16, 32, etc. There's a good reason for favouring prime-number sizes, but there are some advantages to power-of-two sizes too.
直到最近,我的答案与 Jon Skeet 的答案非常接近。然而,我最近开始了一个项目,它使用了 2 的幂哈希表,即内部表的大小为 8、16、32 等的哈希表。有一个很好的理由支持质数大小,但是有两个大小的幂也有一些优势。
And it pretty much sucked. So after a bit of experimentation and research I started re-hashing my hashes with the following:
它非常糟糕。因此,经过一些实验和研究后,我开始使用以下内容重新散列我的哈希值:
public static int ReHash(int source)
{
unchecked
{
ulong c = 0xDEADBEEFDEADBEEF + (ulong)source;
ulong d = 0xE2ADBEEFDEADBEEF ^ c;
ulong a = d += c = c << 15 | c >> -15;
ulong b = a += d = d << 52 | d >> -52;
c ^= b += a = a << 26 | a >> -26;
d ^= c += b = b << 51 | b >> -51;
a ^= d += c = c << 28 | c >> -28;
b ^= a += d = d << 9 | d >> -9;
c ^= b += a = a << 47 | a >> -47;
d ^= c += b << 54 | b >> -54;
a ^= d += c << 32 | c >> 32;
a += d << 25 | d >> -25;
return (int)(a >> 1);
}
}
And then my power-of-two hash table didn't suck any more.
然后我的 2 的幂哈希表不再糟糕了。
This disturbed me though, because the above shouldn't work. Or more precisely, it shouldn't work unless the original GetHashCode()was poor in a very particular way.
不过,这让我感到不安,因为上述方法不起作用。或者更准确地说,它不应该工作,除非原件GetHashCode()在非常特殊的方面很差。
Re-mixing a hashcode can't improve a great hashcode, because the only possible effect is that we introduce a few more collisions.
重新混合哈希码不能改善一个好的哈希码,因为唯一可能的影响是我们引入了更多的冲突。
Re-mixing a hash code can't improve a terrible hash code, because the only possible effect is we change e.g. a large number of collisions on value 53 to a large number of value 18,3487,291.
重新混合哈希码不能改善糟糕的哈希码,因为唯一可能的影响是我们将值 53 上的大量冲突更改为大量值 18,3487,291。
Re-mixing a hash code can only improve a hash code that did at least fairly well in avoiding absolute collisions throughout its range (232possible values) but badly at avoiding collisions when modulo'd down for actual use in a hash table. While the simpler modulo of a power-of-two table made this more apparent, it was also having a negative effect with the more common prime-number tables, that just wasn't as obvious (the extra work in rehashing would outweigh the benefit, but the benefit would still be there).
重新混合散列码只能改进至少在避免整个范围内的绝对冲突(2 32 个可能值)方面做得相当好的散列码,但在对散列表中的实际使用进行取模时在避免冲突方面做得很差。虽然 2 的幂表的更简单的模使这一点更加明显,但它也对更常见的素数表产生负面影响,只是不那么明显(重新散列的额外工作将超过收益,但好处仍然存在)。
Edit: I was also using open-addressing, which would also have increased the sensitivity to collision, perhaps more so than the fact it was power-of-two.
编辑:我也在使用开放寻址,这也会增加对碰撞的敏感性,也许比它是二的幂的事实更是如此。
And well, it was disturbing how much the string.GetHashCode()implementations in .NET(or study here) could be improved this way (on the order of tests running about 20-30 times faster due to fewer collisions) and more disturbing how much my own hash codes could be improved (much more than that).
好吧,令人不安的string.GetHashCode()是.NET 中的实现(或在此处学习)可以通过这种方式改进多少(由于冲突较少,测试运行速度大约快 20-30 倍),更令人不安的是我自己的哈希码可以改进(远不止于此)。
All the GetHashCode() implementations I'd coded in the past, and indeed used as the basis of answers on this site, were much worse than I'd throught. Much of the time it was "good enough" for much of the uses, but I wanted something better.
我过去编码的所有 GetHashCode() 实现,确实用作本网站上答案的基础,比我通过的要糟糕得多。大多数情况下,它对于许多用途来说“足够好”,但我想要更好的东西。
So I put that project to one side (it was a pet project anyway) and started looking at how to produce a good, well-distributed hash code in .NET quickly.
所以我把这个项目放在一边(无论如何它是一个宠物项目)并开始研究如何在 .NET 中快速生成一个好的、分布良好的哈希代码。
In the end I settled on porting SpookyHashto .NET. Indeed the code above is a fast-path version of using SpookyHash to produce a 32-bit output from a 32-bit input.
最后我决定将SpookyHash移植到 .NET。实际上,上面的代码是使用 SpookyHash 从 32 位输入生成 32 位输出的快速路径版本。
Now, SpookyHash is not a nice quick to remember piece of code. My port of it is even less so because I hand-inlined a lot of it for better speed*. But that's what code reuse is for.
现在,SpookyHash 不是一个好记的代码片段。我的端口甚至更少,因为我手动内联了很多以获得更好的速度*。但这就是代码重用的目的。
Then I put thatproject to one side, because just as the original project had produced the question of how to produce a better hash code, so that project produced the question of how to produce a better .NET memcpy.
然后我把那个项目放在一边,因为就像原始项目产生了如何产生更好的哈希码的问题一样,所以该项目产生了如何产生更好的 .NET memcpy 的问题。
Then I came back, and produced a lot of overloads to easily feed just about all of the native types (except decimal?) into a hash code.
然后我回来了,并产生了很多重载来轻松地将几乎所有的本机类型(除了decimal?)输入一个哈希码。
It's fast, for which Bob Jenkins deserves most of the credit because his original code I ported from is faster still, especially on 64-bit machines which the algorithm is optimised for?.
它很快,对此 Bob Jenkins 应得的大部分功劳,因为我移植的他的原始代码仍然更快,尤其是在算法优化的 64 位机器上?
The full code can be seen at https://bitbucket.org/JonHanna/spookilysharp/srcbut consider that the code above is a simplified version of it.
完整代码可以在https://bitbucket.org/JonHanna/spookilysharp/src看到,但请考虑上面的代码是它的简化版本。
However, since it's now already written, one can make use of it more easily:
但是,由于它现在已经编写完毕,因此可以更轻松地使用它:
public override int GetHashCode()
{
var hash = new SpookyHash();
hash.Update(field1);
hash.Update(field2);
hash.Update(field3);
return hash.Final().GetHashCode();
}
It also takes seed values, so if you need to deal with untrusted input and want to protect against Hash DoS attacks you can set a seed based on uptime or similar, and make the results unpredictable by attackers:
它还需要种子值,因此如果您需要处理不受信任的输入并希望防止 Hash DoS 攻击,您可以根据正常运行时间或类似情况设置种子,并使攻击者无法预测结果:
private static long hashSeed0 = Environment.TickCount;
private static long hashSeed1 = DateTime.Now.Ticks;
public override int GetHashCode()
{
//produce different hashes ever time this application is restarted
//but remain consistent in each run, so attackers have a harder time
//DoSing the hash tables.
var hash = new SpookyHash(hashSeed0, hashSeed1);
hash.Update(field1);
hash.Update(field2);
hash.Update(field3);
return hash.Final().GetHashCode();
}
*A big surprise in this is that hand-inlining a rotation method that returned (x << n) | (x >> -n)improved things. I would have been sure that the jitter would have inlined that for me, but profiling showed otherwise.
* 一个很大的惊喜是手工内联了一个可以返回(x << n) | (x >> -n)改进内容的旋转方法。我本来可以确定抖动会为我内联它,但分析显示并非如此。
?decimalisn't native from the .NET perspective though it is from the C#. The problem with it is that its own GetHashCode()treats precision as significant while its own Equals()does not. Both are valid choices, but not mixed like that. In implementing your own version, you need to choose to do one, or the other, but I can't know which you'd want.
? decimal虽然它来自 C#,但从 .NET 的角度来看不是原生的。它的问题在于它自己GetHashCode()将精度视为重要而它自己的Equals()则不然。两者都是有效的选择,但不能像那样混合。在实现您自己的版本时,您需要选择执行一个或另一个,但我不知道您想要哪个。
?By way of comparison. If used on a string, the SpookyHash on 64 bits is considerably faster than string.GetHashCode()on 32 bits which is slightly faster than string.GetHashCode()on 64 bits, which is considerably faster than SpookyHash on 32 bits, though still fast enough to be a reasonable choice.
?通过比较。如果在字符串上使用,64 位的 SpookyHash 比string.GetHashCode()32 位快得多string.GetHashCode(),后者比64 位略快,后者比 32 位的 SpookyHash 快得多,但仍然足够快,是一个合理的选择。
回答by Magnus
This is a good one:
这个不错:
/// <summary>
/// Helper class for generating hash codes suitable
/// for use in hashing algorithms and data structures like a hash table.
/// </summary>
public static class HashCodeHelper
{
private static int GetHashCodeInternal(int key1, int key2)
{
unchecked
{
var num = 0x7e53a269;
num = (-1521134295 * num) + key1;
num += (num << 10);
num ^= (num >> 6);
num = ((-1521134295 * num) + key2);
num += (num << 10);
num ^= (num >> 6);
return num;
}
}
/// <summary>
/// Returns a hash code for the specified objects
/// </summary>
/// <param name="arr">An array of objects used for generating the
/// hash code.</param>
/// <returns>
/// A hash code, suitable for use in hashing algorithms and data
/// structures like a hash table.
/// </returns>
public static int GetHashCode(params object[] arr)
{
int hash = 0;
foreach (var item in arr)
hash = GetHashCodeInternal(hash, item.GetHashCode());
return hash;
}
/// <summary>
/// Returns a hash code for the specified objects
/// </summary>
/// <param name="obj1">The first object.</param>
/// <param name="obj2">The second object.</param>
/// <param name="obj3">The third object.</param>
/// <param name="obj4">The fourth object.</param>
/// <returns>
/// A hash code, suitable for use in hashing algorithms and
/// data structures like a hash table.
/// </returns>
public static int GetHashCode<T1, T2, T3, T4>(T1 obj1, T2 obj2, T3 obj3,
T4 obj4)
{
return GetHashCode(obj1, GetHashCode(obj2, obj3, obj4));
}
/// <summary>
/// Returns a hash code for the specified objects
/// </summary>
/// <param name="obj1">The first object.</param>
/// <param name="obj2">The second object.</param>
/// <param name="obj3">The third object.</param>
/// <returns>
/// A hash code, suitable for use in hashing algorithms and data
/// structures like a hash table.
/// </returns>
public static int GetHashCode<T1, T2, T3>(T1 obj1, T2 obj2, T3 obj3)
{
return GetHashCode(obj1, GetHashCode(obj2, obj3));
}
/// <summary>
/// Returns a hash code for the specified objects
/// </summary>
/// <param name="obj1">The first object.</param>
/// <param name="obj2">The second object.</param>
/// <returns>
/// A hash code, suitable for use in hashing algorithms and data
/// structures like a hash table.
/// </returns>
public static int GetHashCode<T1, T2>(T1 obj1, T2 obj2)
{
return GetHashCodeInternal(obj1.GetHashCode(), obj2.GetHashCode());
}
}
And here is how to use it:
这是如何使用它:
private struct Key
{
private Type _type;
private string _field;
public Type Type { get { return _type; } }
public string Field { get { return _field; } }
public Key(Type type, string field)
{
_type = type;
_field = field;
}
public override int GetHashCode()
{
return HashCodeHelper.GetHashCode(_field, _type);
}
public override bool Equals(object obj)
{
if (!(obj is Key))
return false;
var tf = (Key)obj;
return tf._field.Equals(_field) && tf._type.Equals(_type);
}
}
回答by James Ko
As of https://github.com/dotnet/coreclr/pull/14863, there is a new way to generate hash codes that is super simple! Just write
从https://github.com/dotnet/coreclr/pull/14863 开始,有一种新方法可以生成超级简单的哈希码!写就好了
public override int GetHashCode()
=> HashCode.Combine(field1, field2, field3);
This will generate a quality hash code without you having to worry about the implementation details.
这将生成一个高质量的哈希码,而您不必担心实现细节。

