C++ std::unordered_map 中使用的默认哈希函数是什么?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/19411742/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
What is the default hash function used in C++ std::unordered_map?
提问by Medicine
I am using
我在用
unordered_map<string, int>
and
和
unordered_map<int, int>
What hash function is used in each case and what is chance of collision in each case? I will be inserting unique string and unique int as keys in each case respectively.
每种情况下使用什么散列函数,每种情况下发生碰撞的几率是多少?我将分别在每种情况下插入唯一字符串和唯一 int 作为键。
I am interested in knowing the algorithm of hash function in case of string and int keys and their collision stats.
我有兴趣了解字符串和 int 键及其碰撞统计信息的哈希函数算法。
回答by Avidan Borisov
The function object std::hash<>
is used.
使用函数对象std::hash<>
。
Standard specializations exist for all built-in types, and some other standard library types
such as std::string
and std::thread
. See the link for the full list.
所有内置类型以及一些其他标准库类型(例如std::string
和 )都存在标准特化std::thread
。查看完整列表的链接。
For other types to be used in a std::unordered_map
, you will have to specialize std::hash<>
or create your own function object.
对于要在 a 中使用的其他类型std::unordered_map
,您必须专门化std::hash<>
或创建自己的函数对象。
The chance of collision is completely implementation-dependent, but considering the fact that integers are limited between a defined range, while strings are theoretically infinitely long, I'd say there is a much better chance for collision with strings.
碰撞的机会完全取决于实现,但考虑到整数限制在定义的范围内这一事实,而字符串理论上是无限长的,我会说与字符串碰撞的机会要大得多。
As for the implementation in GCC, the specialization for builtin-types just returns the bit pattern. Here's how they are defined in bits/functional_hash.h
:
至于在 GCC 中的实现,内置类型的特化只返回位模式。以下是它们的定义方式bits/functional_hash.h
:
/// Partial specializations for pointer types.
template<typename _Tp>
struct hash<_Tp*> : public __hash_base<size_t, _Tp*>
{
size_t
operator()(_Tp* __p) const noexcept
{ return reinterpret_cast<size_t>(__p); }
};
// Explicit specializations for integer types.
#define _Cxx_hashtable_define_trivial_hash(_Tp) \
template<> \
struct hash<_Tp> : public __hash_base<size_t, _Tp> \
{ \
size_t \
operator()(_Tp __val) const noexcept \
{ return static_cast<size_t>(__val); } \
};
/// Explicit specialization for bool.
_Cxx_hashtable_define_trivial_hash(bool)
/// Explicit specialization for char.
_Cxx_hashtable_define_trivial_hash(char)
/// ...
The specialization for std::string
is defined as:
专业化std::string
定义为:
#ifndef _GLIBCXX_COMPATIBILITY_CXX0X
/// std::hash specialization for string.
template<>
struct hash<string>
: public __hash_base<size_t, string>
{
size_t
operator()(const string& __s) const noexcept
{ return std::_Hash_impl::hash(__s.data(), __s.length()); }
};
Some further search leads us to:
一些进一步的搜索导致我们:
struct _Hash_impl
{
static size_t
hash(const void* __ptr, size_t __clength,
size_t __seed = static_cast<size_t>(0xc70f6907UL))
{ return _Hash_bytes(__ptr, __clength, __seed); }
...
};
...
// Hash function implementation for the nontrivial specialization.
// All of them are based on a primitive that hashes a pointer to a
// byte array. The actual hash algorithm is not guaranteed to stay
// the same from release to release -- it may be updated or tuned to
// improve hash quality or speed.
size_t
_Hash_bytes(const void* __ptr, size_t __len, size_t __seed);
_Hash_bytes
is an external function from libstdc++
. A bit more searching led me to this file, which states:
_Hash_bytes
是来自 的外部函数libstdc++
。更多的搜索使我找到了这个文件,其中指出:
// This file defines Hash_bytes, a primitive used for defining hash
// functions. Based on public domain MurmurHashUnaligned2, by Austin
// Appleby. http://murmurhash.googlepages.com/
So the default hashing algorithm GCC uses for strings is MurmurHashUnaligned2.
因此,GCC 用于字符串的默认散列算法是 MurmurHashUnaligned2。
回答by Gabriel Staples
Though the hashing algorithms are compiler-dependent, I'll present it for GCC C++11. @Avidan Borisov astutely discoveredthat the GCC hashing algorithm used for strings is "MurmurHashUnaligned2," by Austin Appleby. I did some searching and found a mirrored copy of GCC on Github. Therefore:
尽管散列算法依赖于编译器,但我将在 GCC C++11 中展示它。@Avidan Borisov 敏锐地发现用于字符串的 GCC 哈希算法是 Austin Appleby 的“MurmurHashUnaligned2”。我做了一些搜索,并在 Github 上找到了 GCC 的镜像副本。所以:
The GCC C++11 hashing functions used for unordered_map
(a hash table template) and unordered_set
(a hash set template) appear to be as follows.
用于unordered_map
(哈希表模板)和unordered_set
(哈希集模板)的 GCC C++11 哈希函数如下所示。
- Thanks to Avidan Borisovfor his background research which on the question of what are the GCC C++11 hash functions used, stating that GCC uses an implementation of "MurmurHashUnaligned2", by Austin Appleby (http://murmurhash.googlepages.com/).
- In the file "gcc/libstdc++-v3/libsupc++/hash_bytes.cc", here (https://github.com/gcc-mirror/gcc/blob/master/libstdc++-v3/libsupc++/hash_bytes.cc), I found the implementations. Here's the one for the "32-bit size_t" return value, for example (pulled 11 Aug 2017)
- 感谢 Avidan Borisov对GCC C++11 哈希函数是什么问题的背景研究,指出 GCC 使用了 Austin Appleby ( http://murmurhash.googlepages.com/)的“MurmurHashUnaligned2”的实现)。
- 在文件“gcc/libstdc++-v3/libsupc++/hash_bytes.cc”中,这里(https://github.com/gcc-mirror/gcc/blob/master/libstdc++-v3/libsupc++/hash_bytes.cc),我发现实现。例如,这是“32 位 size_t”返回值的一个(拉 2017 年 8 月 11 日)
Code:
代码:
// Implementation of Murmur hash for 32-bit size_t.
size_t _Hash_bytes(const void* ptr, size_t len, size_t seed)
{
const size_t m = 0x5bd1e995;
size_t hash = seed ^ len;
const char* buf = static_cast<const char*>(ptr);
// Mix 4 bytes at a time into the hash.
while (len >= 4)
{
size_t k = unaligned_load(buf);
k *= m;
k ^= k >> 24;
k *= m;
hash *= m;
hash ^= k;
buf += 4;
len -= 4;
}
// Handle the last few bytes of the input array.
switch (len)
{
case 3:
hash ^= static_cast<unsigned char>(buf[2]) << 16;
[[gnu::fallthrough]];
case 2:
hash ^= static_cast<unsigned char>(buf[1]) << 8;
[[gnu::fallthrough]];
case 1:
hash ^= static_cast<unsigned char>(buf[0]);
hash *= m;
};
// Do a few final mixes of the hash.
hash ^= hash >> 13;
hash *= m;
hash ^= hash >> 15;
return hash;
}
For additional hashing functions, including djb2
, and the 2 versions of the K&R hashing functions(one apparently terrible, one pretty good), see my other answer here: https://stackoverflow.com/a/45641002/4561887.
有关其他散列函数,包括djb2
和 K&R 散列函数的 2 个版本(一个显然很糟糕,一个非常好),请参阅我的另一个答案:https: //stackoverflow.com/a/45641002/4561887。