C语言 如何在C中编写哈希函数?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/2238107/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to write a hash function in C?
提问by aks
Hash Tables are said to be the fastest/best way of Storing/Retrieving data.
哈希表被认为是存储/检索数据的最快/最佳方式。
My understanding of a hash table, hashing is as follows (Please correct me if I am wrong or Please add If there is anything more):
我对一个hash表的理解,hash如下(如有不对请指正或补充):
- A Hash Tableis nothing but an array (single or multi-dimensional) to store values.
- Hashingis the process to find the index/location in the array to insert/retrieve the data. You take a data item(s) and pass it as a key(s) to a hash function and you would get the index/location where to insert/retrieve the data.
- 甲哈希表只不过是一个数组(单个或多维的)来存储值。
- 散列是在数组中查找索引/位置以插入/检索数据的过程。您获取一个数据项并将其作为键传递给散列函数,您将获得插入/检索数据的索引/位置。
I have a question:
我有个问题:
Is the hash function used to store/retrieve the data DIFFERENT from a cryptographic hash function used in security applications for authentication like MD5, HMAC, SHA-1 etc...?
用于存储/检索数据的哈希函数是否与安全应用程序中用于身份验证(如 MD5、HMAC、SHA-1 等...)的加密哈希函数不同?
In what way(s) are they different?
它们在哪些方面不同?
- How to write a hash function in C?
- Is there some standard or guidelines to it?
- How do we ensure that the output of a hash function i.e, index is not out of range?
- 如何在C中编写哈希函数?
- 是否有一些标准或指南?
- 我们如何确保哈希函数的输出,即索引不超出范围?
It would be great if you could mention some good links to understand these better.
如果您能提及一些好的链接以更好地理解这些链接,那就太好了。
回答by Jerry Coffin
A cryptographic hash emphasizes making it difficult for anybody to intentionally create a collision. For a hash table, the emphasis is normally on producing a reasonable spread of results quickly. As such, the two are usually quite different (in particular, a cryptographic hash is normally a lotslower).
加密散列强调让任何人都难以故意制造冲突。对于哈希表,重点通常是快速生成合理分布的结果。因此,两者通常完全不同(特别是,加密哈希通常要慢得多)。
For a typical hash function, the result is limited only by the type -- e.g. if it returns a size_t, it's perfectly fine for it to return anypossible size_t. It's up to you to reduce that output range to the size of your table (e.g. using the remainder of dividing by the size of your table, which should often be a prime number).
对于典型的散列函数,结果仅受类型限制——例如,如果它返回一个 size_t,那么它返回任何可能的 size_t 就完全没问题了。由您决定将输出范围缩小到您的表格大小(例如,使用除以表格大小的余数,这通常应该是质数)。
As an example, a fairly typical normal hash function might look something like:
例如,一个相当典型的普通哈希函数可能如下所示:
// warning: untested code.
size_t hash(char const *input) {
const int ret_size = 32;
size_t ret = 0x555555;
const int per_char = 7;
while (*input) {
ret ^= *input++;
ret = ((ret << per_char) | (ret >> (ret_size - per_char));
}
return ret;
}
The basic idea here is to have every bit of the input string affect the result, and to (as quickly as possible) have every bit of the result affected by at least part of the input. Note that I'm not particularly recommending this as a great hash function -- only trying to illustrate some of the basics of what you're trying to accomplish.
这里的基本思想是让输入字符串的每一位都影响结果,并且(尽可能快地)让结果的每一位至少受到部分输入的影响。请注意,我并不是特别推荐它作为一个很好的散列函数——只是试图说明您正在尝试完成的一些基础知识。
回答by RossFabricant
Bob Jenkins wrote an in-depth description of his good, if slightly outdated, hash function. The article has links to newer, better hash functions, but the writeup addresses the concerns of building a good one.
鲍勃·詹金斯 (Bob Jenkins) 深入描述了他的优秀散列函数,尽管有些过时。这篇文章有更新、更好的散列函数的链接,但这篇文章解决了构建一个好的散列函数的问题。
Also, most hash table implementations actually use an array of linked lists to resolve collisions. If you want to just use an array then the hash function needs to check for collisions and create a new hash index.
此外,大多数哈希表实现实际上使用链表数组来解决冲突。如果您只想使用数组,则哈希函数需要检查冲突并创建新的哈希索引。
The cryptographic hash functions you mention could be used as hash functions for a hash table, but they are much slower than hash functions designed for a hash table. Speed makes brute force attacks easier.
您提到的加密哈希函数可以用作哈希表的哈希函数,但它们比为哈希表设计的哈希函数慢得多。速度使蛮力攻击更容易。
回答by Anssi
The design goals are different.
设计目标不同。
With cryptographic hash functionsyou want, for example, that the hash and the hash function cannot be used to determine the original data or any other data that would produce the same hash.
例如,对于加密散列函数,您希望散列和散列函数不能用于确定原始数据或任何其他会产生相同散列的数据。
Hash functions used with hash tables & other data structures do not need such security properties. It's often enough if the hash function is fast and it will distribute the input set evenly into the set of possible hashes (to avoid unnecessary clustering / collisions).
与哈希表和其他数据结构一起使用的哈希函数不需要这样的安全属性。如果散列函数很快,并且它将输入集均匀地分布到可能的散列集(以避免不必要的聚类/冲突)中,通常就足够了。

