C++ 编译时字符串散列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/2111667/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Compile time string hashing
提问by deft_code
I have read in few different places that using C++11's new string literals it might be possible to compute a string's hash at compile time. However, no one seems to be ready to come out and say that it will be possible or how it would be done.
我在几个不同的地方读到,使用 C++11 的新字符串文字可能可以在编译时计算字符串的哈希值。然而,似乎没有人准备出来说这将是可能的或将如何完成。
- Is this possible?
- What would the operator look like?
- 这可能吗?
- 操作员会是什么样子?
I'm particularly interested use cases like this.
我对这样的用例特别感兴趣。
void foo( const std::string& value )
{
switch( std::hash(value) )
{
case "one"_hash: one(); break;
case "two"_hash: two(); break;
/*many more cases*/
default: other(); break;
}
}
Note: the compile time hash function doesn't have to look exactly as I've written it. I did my best to guess what the final solution would look like, but meta_hash<"string"_meta>::value
could also be a viable solution.
注意:编译时散列函数不必与我编写的完全一样。我尽力猜测最终的解决方案会是什么样子,但meta_hash<"string"_meta>::value
也可能是一个可行的解决方案。
采纳答案by Clement JACOB
This is a little bit late, but I succeeded in implementing a compile-time CRC32 function with the use of constexpr
. The problem with it is that at the time of writing, it only works with GCC and not MSVC nor Intel compiler.
这有点晚了,但我成功地使用constexpr
. 它的问题在于,在撰写本文时,它仅适用于 GCC,不适用于 MSVC 或 Intel 编译器。
Here is the code snippet:
这是代码片段:
// CRC32 Table (zlib polynomial)
static constexpr uint32_t crc_table[256] = {
0x00000000L, 0x77073096L, 0xee0e612cL, 0x990951baL, 0x076dc419L,
0x706af48fL, 0xe963a535L, 0x9e6495a3L, 0x0edb8832L, 0x79dcb8a4L,
0xe0d5e91eL, 0x97d2d988L, 0x09b64c2bL, 0x7eb17cbdL, 0xe7b82d07L,
...
};
template<size_t idx>
constexpr uint32_t crc32(const char * str)
{
return (crc32<idx-1>(str) >> 8) ^ crc_table[(crc32<idx-1>(str) ^ str[idx]) & 0x000000FF];
}
// This is the stop-recursion function
template<>
constexpr uint32_t crc32<size_t(-1)>(const char * str)
{
return 0xFFFFFFFF;
}
// This doesn't take into account the nul char
#define COMPILE_TIME_CRC32_STR(x) (crc32<sizeof(x) - 2>(x) ^ 0xFFFFFFFF)
enum TestEnum
{
CrcVal01 = COMPILE_TIME_CRC32_STR("stack-overflow"),
};
CrcVal01
is equal to 0x335CC04A
CrcVal01
等于 0x335CC04A
Hope this will help you!
希望能帮到你!
回答by Jerry Coffin
At least by my reading of §7.1.5/3 and §5.19, the following might be legitimate:
至少通过我对 §7.1.5/3 和 §5.19 的阅读,以下内容可能是合法的:
unsigned constexpr const_hash(char const *input) {
return *input ?
static_cast<unsigned int>(*input) + 33 * const_hash(input + 1) :
5381;
}
This does seem to follow the basic rules in §7.1.5/3:
这似乎确实遵循 §7.1.5/3 中的基本规则:
- The form is: "return expression;"
- Its only parameter is a pointer, which is a scalar type, and therefore a literal type.
- Its return is unsigned int, which is also scalar (and therefore literal).
- There is no implicit conversion to the return type.
- 形式为:“返回表达式;”
- 它唯一的参数是一个指针,它是一个标量类型,因此是一个文字类型。
- 它的返回是 unsigned int,它也是标量(因此是文字)。
- 没有对返回类型的隐式转换。
There is some question whether the *input
s involve an illegal lvalue to rvalue conversion, and I'm not sure I understand the rules in §5.19/2/6/21and §4.1 well enough to be sure about that.
有一些问题*input
s是否涉及非法的左值到右值转换,我不确定我是否充分理解 §5.19/2/6/2 1和§4.1 中的规则以确保这一点。
From a practical viewpoint, this code is accepted by (for one example) g++, at least as far back as g++ 4.7.1.
从实用的角度来看,此代码已被(例如)g++ 接受,至少可以追溯到 g++ 4.7.1。
Usage would be something like:
用法类似于:
switch(std::hash(value)) {
case const_hash("one"): one(); break;
case const_hash("two"): two(); break;
// ...
default: other(); break;
}
To comply with the requirements of §5.19/2/6/2 you might have to do something like this though:
为了符合 §5.19/2/6/2 的要求,您可能必须执行以下操作:
// one of the `constexpr`s is probably redundant, but I haven't figure out which.
char constexpr * constexpr v_one = "one";
// ....
case const_hash(v_one): one(); break;
- I'm using the extra 'slash' numbers to refer to unnumbered bullet points, so this is the second bullet point inside if the sixth bullet point under §5.19/2. I think I might have to talk to Pete Becker about whether it's possible to add some sort of numbers/letters/roman numerals down the hierarchy to identify pieces like this...
- 我使用额外的“斜线”数字来指代未编号的项目符号,因此这是第 5.19/2 节下的第六个项目符号点的第二个项目符号点。我想我可能得和皮特贝克尔谈谈是否有可能在层次结构中添加某种数字/字母/罗马数字来识别这样的作品......
回答by tower120
This snippet based on Clement JACOB's one. But works with clang too. And it should be faster on compilation (it have only one recursive call, not two like in original post).
这个片段基于 Clement JACOB 的片段。但也适用于叮当声。并且编译速度应该更快(它只有一个递归调用,而不是像原始帖子中的两个)。
#include <iostream>
#include <string>
#include <vector>
static constexpr unsigned int crc_table[256] = {
0x00000000, 0x77073096, 0xee0e612c, 0x990951ba, 0x076dc419, 0x706af48f,
0xe963a535, 0x9e6495a3, 0x0edb8832, 0x79dcb8a4, 0xe0d5e91e, 0x97d2d988,
0x09b64c2b, 0x7eb17cbd, 0xe7b82d07, 0x90bf1d91, 0x1db71064, 0x6ab020f2,
0xf3b97148, 0x84be41de, 0x1adad47d, 0x6ddde4eb, 0xf4d4b551, 0x83d385c7,
0x136c9856, 0x646ba8c0, 0xfd62f97a, 0x8a65c9ec, 0x14015c4f, 0x63066cd9,
0xfa0f3d63, 0x8d080df5, 0x3b6e20c8, 0x4c69105e, 0xd56041e4, 0xa2677172,
0x3c03e4d1, 0x4b04d447, 0xd20d85fd, 0xa50ab56b, 0x35b5a8fa, 0x42b2986c,
0xdbbbc9d6, 0xacbcf940, 0x32d86ce3, 0x45df5c75, 0xdcd60dcf, 0xabd13d59,
0x26d930ac, 0x51de003a, 0xc8d75180, 0xbfd06116, 0x21b4f4b5, 0x56b3c423,
0xcfba9599, 0xb8bda50f, 0x2802b89e, 0x5f058808, 0xc60cd9b2, 0xb10be924,
0x2f6f7c87, 0x58684c11, 0xc1611dab, 0xb6662d3d, 0x76dc4190, 0x01db7106,
0x98d220bc, 0xefd5102a, 0x71b18589, 0x06b6b51f, 0x9fbfe4a5, 0xe8b8d433,
0x7807c9a2, 0x0f00f934, 0x9609a88e, 0xe10e9818, 0x7f6a0dbb, 0x086d3d2d,
0x91646c97, 0xe6635c01, 0x6b6b51f4, 0x1c6c6162, 0x856530d8, 0xf262004e,
0x6c0695ed, 0x1b01a57b, 0x8208f4c1, 0xf50fc457, 0x65b0d9c6, 0x12b7e950,
0x8bbeb8ea, 0xfcb9887c, 0x62dd1ddf, 0x15da2d49, 0x8cd37cf3, 0xfbd44c65,
0x4db26158, 0x3ab551ce, 0xa3bc0074, 0xd4bb30e2, 0x4adfa541, 0x3dd895d7,
0xa4d1c46d, 0xd3d6f4fb, 0x4369e96a, 0x346ed9fc, 0xad678846, 0xda60b8d0,
0x44042d73, 0x33031de5, 0xaa0a4c5f, 0xdd0d7cc9, 0x5005713c, 0x270241aa,
0xbe0b1010, 0xc90c2086, 0x5768b525, 0x206f85b3, 0xb966d409, 0xce61e49f,
0x5edef90e, 0x29d9c998, 0xb0d09822, 0xc7d7a8b4, 0x59b33d17, 0x2eb40d81,
0xb7bd5c3b, 0xc0ba6cad, 0xedb88320, 0x9abfb3b6, 0x03b6e20c, 0x74b1d29a,
0xead54739, 0x9dd277af, 0x04db2615, 0x73dc1683, 0xe3630b12, 0x94643b84,
0x0d6d6a3e, 0x7a6a5aa8, 0xe40ecf0b, 0x9309ff9d, 0x0a00ae27, 0x7d079eb1,
0xf00f9344, 0x8708a3d2, 0x1e01f268, 0x6906c2fe, 0xf762575d, 0x806567cb,
0x196c3671, 0x6e6b06e7, 0xfed41b76, 0x89d32be0, 0x10da7a5a, 0x67dd4acc,
0xf9b9df6f, 0x8ebeeff9, 0x17b7be43, 0x60b08ed5, 0xd6d6a3e8, 0xa1d1937e,
0x38d8c2c4, 0x4fdff252, 0xd1bb67f1, 0xa6bc5767, 0x3fb506dd, 0x48b2364b,
0xd80d2bda, 0xaf0a1b4c, 0x36034af6, 0x41047a60, 0xdf60efc3, 0xa867df55,
0x316e8eef, 0x4669be79, 0xcb61b38c, 0xbc66831a, 0x256fd2a0, 0x5268e236,
0xcc0c7795, 0xbb0b4703, 0x220216b9, 0x5505262f, 0xc5ba3bbe, 0xb2bd0b28,
0x2bb45a92, 0x5cb36a04, 0xc2d7ffa7, 0xb5d0cf31, 0x2cd99e8b, 0x5bdeae1d,
0x9b64c2b0, 0xec63f226, 0x756aa39c, 0x026d930a, 0x9c0906a9, 0xeb0e363f,
0x72076785, 0x05005713, 0x95bf4a82, 0xe2b87a14, 0x7bb12bae, 0x0cb61b38,
0x92d28e9b, 0xe5d5be0d, 0x7cdcefb7, 0x0bdbdf21, 0x86d3d2d4, 0xf1d4e242,
0x68ddb3f8, 0x1fda836e, 0x81be16cd, 0xf6b9265b, 0x6fb077e1, 0x18b74777,
0x88085ae6, 0xff0f6a70, 0x66063bca, 0x11010b5c, 0x8f659eff, 0xf862ae69,
0x616bffd3, 0x166ccf45, 0xa00ae278, 0xd70dd2ee, 0x4e048354, 0x3903b3c2,
0xa7672661, 0xd06016f7, 0x4969474d, 0x3e6e77db, 0xaed16a4a, 0xd9d65adc,
0x40df0b66, 0x37d83bf0, 0xa9bcae53, 0xdebb9ec5, 0x47b2cf7f, 0x30b5ffe9,
0xbdbdf21c, 0xcabac28a, 0x53b39330, 0x24b4a3a6, 0xbad03605, 0xcdd70693,
0x54de5729, 0x23d967bf, 0xb3667a2e, 0xc4614ab8, 0x5d681b02, 0x2a6f2b94,
0xb40bbe37, 0xc30c8ea1, 0x5a05df1b, 0x2d02ef8d
};
template<int size, int idx = 0, class dummy = void>
struct MM{
static constexpr unsigned int crc32(const char * str, unsigned int prev_crc = 0xFFFFFFFF)
{
return MM<size, idx+1>::crc32(str, (prev_crc >> 8) ^ crc_table[(prev_crc ^ str[idx]) & 0xFF] );
}
};
// This is the stop-recursion function
template<int size, class dummy>
struct MM<size, size, dummy>{
static constexpr unsigned int crc32(const char * str, unsigned int prev_crc = 0xFFFFFFFF)
{
return prev_crc^ 0xFFFFFFFF;
}
};
// This don't take into account the nul char
#define COMPILE_TIME_CRC32_STR(x) (MM<sizeof(x)-1>::crc32(x))
template<unsigned int crc>
void PrintCrc()
{
std::cout << crc << std::endl;
}
int main()
{
PrintCrc<COMPILE_TIME_CRC32_STR("HAH")>();
}
See proof of concept here
在此处查看概念证明
回答by Yakk - Adam Nevraumont
This is an attempt to solve the OP's problem as exactly as possible.
这是尝试尽可能准确地解决 OP 的问题。
namespace my_hash {
template<class>struct hasher;
template<>
struct hasher<std::string> {
std::size_t constexpr operator()(char const *input)const {
return *input ?
static_cast<unsigned int>(*input) + 33 * (*this)(input + 1) :
5381;
}
std::size_t operator()( const std::string& str ) const {
return (*this)(str.c_str());
}
};
template<typename T>
std::size_t constexpr hash(T&& t) {
return hasher< typename std::decay<T>::type >()(std::forward<T>(t));
}
inline namespace literals {
std::size_t constexpr operator "" _hash(const char* s,size_t) {
return hasher<std::string>()(s);
}
}
}
using namespace my_hash::literals;
void one() {} void two() {} void other() {}
void foo( const std::string& value )
{
switch( my_hash::hash(value) )
{
case "one"_hash: one(); break;
case "two"_hash: two(); break;
/*many more cases*/
default: other(); break;
}
}
Note the main difference -- std::hash
cannot be used, as we do not have control over std::hash
's algorithm, and we mustreimplement it as a constexpr
in order to evaluate it at compile time. In addition, there are no "transparent" hashes in std
, so you cannot (without creating a std::string
) hash a raw character buffer as a std::string
.
请注意主要区别 -std::hash
不能使用,因为我们无法控制std::hash
的算法,我们必须将其重新实现为 aconstexpr
以便在编译时对其进行评估。此外, 中没有“透明”散列std
,因此您不能(不创建std::string
)将原始字符缓冲区散列为std::string
.
I stuck the std::string
custom hasher (with transparent const char*
support) into a my_hash
namespace, so you can store it in a std::unordered_map
if you need consistency.
我将std::string
自定义散列器(具有透明const char*
支持)粘贴到my_hash
命名空间中,因此std::unordered_map
如果需要一致性,您可以将其存储在 a 中。
Based off of @JerryCoffin's excellent answer and the comment thread below it, but with an attempt to write it with current C++11 best practices (as opposed to anticipating them!).
基于@JerryCoffin 的优秀答案及其下方的评论线程,但尝试使用当前的 C++11 最佳实践(而不是预期它们!)来编写它。
Note that using a "raw hash" for a switch
statement case
is dangerous. You'll want to do an ==
comparison afterwards to confirm it worked.
请注意,对switch
语句使用“原始哈希”case
是危险的。之后您需要进行==
比较以确认它有效。
回答by Georg Fritzsche
Note that the form shown here wasn't accepted into the standard, as noted below.
请注意,此处显示的表格并未被标准所接受,如下所述。
Compile time string processing is guessed to become possible through user-defined literalsproposed in N2765.
As i already mentioned, i don't know of any compiler that currently implements it and without compiler support there can be only guess work.
通过N2765 中提出的用户定义文字,编译时字符串处理被猜测成为可能。
正如我已经提到的,我不知道当前实现它的任何编译器,如果没有编译器支持,只能猜测工作。
In §2.13.7.3 and 4 of the draftwe have the following:
在草案的§2.13.7.3 和 4 中,我们有以下内容:
Otherwise (S contains a literal operator template), L is treated as a call of the form
operator "" X<'c1', 'c2', ... , 'ck'>() where n is the source character sequence c1c2...ck. [Note: The sequence c1c2...ck can only contain characters from the basic source character set. —end note]
否则(S 包含文字运算符模板),L 被视为形式
运算符 "" X<'c1', 'c2', ... , 'ck'>()的调用,其中 n 是源字符序列 c1c2 ……咳。[注意:序列 c1c2...ck 只能包含来自基本源字符集的字符。——尾注]
Combine that with constexpr
and we should have compile time string processing.
结合它,constexpr
我们应该有编译时字符串处理。
update: i overlooked that i was reading the wrong paragraph, this form is allowed for user-defined-integer-literals and -floating-literals, but apparently not for -string-literals (§2.13.7.5).
This part of the proposal seems to have not been accepted.
更新:我忽略了我正在阅读错误的段落,这种形式允许用于用户定义的整数文字和 -floating-literals,但显然不适用于 -string-literals(第 2.13.7.5 节)。
这部分提案似乎没有被接受。
That being said, with my limited glimpse at C++0x, it mightlook something like this (i most likely got something wrong):
话虽如此,以我对 C++0x 的有限一瞥,它可能看起来像这样(我很可能出错了):
template<char c, char... str>
struct hash {
static const unsigned result = c + hash<str...>::result;
};
template<char c>
struct hash {
static const unsigned result = c;
};
template<char... str>
constexpr unsigned
operator "" _hash() {
return hash<str>::result;
}
// update: probably wrong, because the above
// form is not allowed for string-literals:
const unsigned h = "abcd"_hash;
If Jerrys approachworks, then the following should work however:
如果Jerrys 方法有效,那么以下应该有效:
constexpr unsigned operator "" _hash(const char* s, size_t) {
return const_hash(s);
}
回答by CygnusX1
Another solution based on Clement JACOB's one, using C++11 constexpr (not the extended C++14) but having only one recursion.
另一种基于 Clement JACOB 的解决方案,使用 C++11 constexpr(不是扩展的 C++14)但只有一个递归。
namespace detail {
// CRC32 Table (zlib polynomial)
static constexpr uint32_t crc_table[256] = { 0x00000000L, 0x77073096L, ... }
template<size_t idx>
constexpr uint32_t combine_crc32(const char * str, uint32_t part) {
return (part >> 8) ^ crc_table[(part ^ str[idx]) & 0x000000FF];
}
template<size_t idx>
constexpr uint32_t crc32(const char * str) {
return combine_crc32<idx>(str, crc32<idx - 1>(str));
}
// This is the stop-recursion function
template<>
constexpr uint32_t crc32<size_t(-1)>(const char * str) {
return 0xFFFFFFFF;
}
} //namespace detail
template <size_t len>
constexpr uint32_t ctcrc32(const char (&str)[len]) {
return detail::crc32<len - 2>(str) ^ 0xFFFFFFFF;
}
Some explanation
一些解释
- We are using a single recursion, so that the function works well even for longer strings.
- The extra function
combine_crc32
allows us to store the result of a recursion under a variablepart
and using it twice. This function is a walkaround for C++11 limitaton disallowing local variable declarations. - The
ctcrc32
function expects a string literal, which is passed as aconst char (&)[len]
. This way we can get the string length as a template parameter and do not have to rely on macros.
- 我们使用的是单递归,因此该函数即使对于更长的字符串也能很好地工作。
- 额外的函数
combine_crc32
允许我们将递归的结果存储在一个变量下part
并使用它两次。此函数是针对 C++11 不允许局部变量声明的限制的解决方法。 - 该
ctcrc32
函数需要一个字符串文字,它作为const char (&)[len]
. 这样我们就可以获取字符串长度作为模板参数,而不必依赖宏。
回答by u0b34a0f6ae
The following works in GCC 4.6.1, and you can use either hash
or pack
in switch blocks.
以下在 GCC 4.6.1 中有效,您可以使用hash
或pack
在 switch 块中。
/* Fast simple string hash (Bernstein?) */
constexpr unsigned int hash(const char *s, int off = 0) {
return !s[off] ? 5381 : (hash(s, off+1)*33) ^ s[off];
}
/* Pack the string into an unsigned int
* Using 7 bits (ascii) it packs 9 chars into a uint64_t
*/
template <class T = uint64_t, unsigned int Bits = 7>
constexpr T pack(const char *s, unsigned int off = 0) {
return (Bits*off >= CHAR_BIT*sizeof(T) || !s[off]) ? 0 :
(((T)s[off] << (Bits*off)) | pack(s,off+1));
}
GCC seemingly(?) does not allow recursive calls where we pass on s+1
with s
a pointer, which is why I use the off
variable.
GCC 表面上(?)不允许递归调用我们s+1
用s
指针传递的地方,这就是我使用off
变量的原因。
回答by Guillaume Gris
If you have a c++17 compiler and string_view, this becomes trivial, just write the normal version:
如果你有一个 c++17 编译器和 string_view,这就变得微不足道了,只需编写普通版本:
constexpr uint32_t crc32(std::string_view str)
{
uint32_t crc = 0xffffffff;
for (auto c : str)
crc = (crc >> 8) ^ crc_table[(crc ^ c) & 0xff];
return crc ^ 0xffffffff;
}
回答by Holt
Here is another C++11 implementation (based on @CygnusX1 answer), which works with both constexpr char arrays and runtime strings:
这是另一个 C++11 实现(基于 @CygnusX1 答案),它适用于 constexpr 字符数组和运行时字符串:
namespace detail {
// CRC32 Table (zlib polynomial)
static constexpr uint32_t crc_table[256] = { 0x00000000L, 0x77073096L, ... };
constexpr uint32_t combine_crc32(size_t idx, const char * str, uint32_t part) {
return (part >> 8) ^ crc_table[(part ^ str[idx]) & 0x000000FF];
}
constexpr uint32_t crc32(size_t idx, const char * str) {
return idx == size_t(-1) ?
0xFFFFFFFF : combine_crc32(idx, str, crc32(idx - 1, str));
}
}
uint32_t ctcrc32(std::string const& str) {
size_t len = str.size() + 1;
return detail::crc32(len - 2, str.c_str()) ^ 0xFFFFFFFF;
}
template <size_t len>
constexpr uint32_t ctcrc32(const char (&str)[len]) {
return detail::crc32(len - 2, str) ^ 0xFFFFFFFF;
}
You need str.size() + 1
because len
in the second overload is strlen(str) + 1
due to the null-character at the end.
您需要,str.size() + 1
因为len
在第二个重载中是strlen(str) + 1
由于末尾的空字符造成的。
I did not add an overload for const char *
because it messes up with the second overload — You may easily add overloads for const char *, size_t
or std::string_view
.
我没有为 for 添加重载,const char *
因为它会与第二个重载混淆——您可以轻松地为const char *, size_t
or添加重载std::string_view
。
回答by manuel saraiva
This is a nice question.
这是一个很好的问题。
Based on Jerry Coffin's answer, I've created another one that's compatible with Visual Studio 2017's std::hash.
根据 Jerry Coffin 的回答,我创建了另一个与 Visual Studio 2017 的 std::hash 兼容的。
#include <functional>
#include <cassert>
using namespace std;
constexpr size_t cx_hash(const char* input) {
size_t hash = sizeof(size_t) == 8 ? 0xcbf29ce484222325 : 0x811c9dc5;
const size_t prime = sizeof(size_t) == 8 ? 0x00000100000001b3 : 0x01000193;
while (*input) {
hash ^= static_cast<size_t>(*input);
hash *= prime;
++input;
}
return hash;
}
int main() {
/* Enter your code here. Read input from STDIN. Print output to STDOUT */
auto a = cx_hash("test");
hash<string> func;
auto b = func("test");
assert(a == b);
return 0;
}