C++ 字符串的哈希函数
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/8317508/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Hash function for a string
提问by Nick
We are currently dealing with hash function in my class. Our instructor asked us to a hash function on the internet to compare to the two we have used in our code.
我们目前正在我的班级中处理哈希函数。我们的讲师要求我们使用 Internet 上的一个哈希函数与我们在代码中使用的两个函数进行比较。
The first one:
第一个:
int HashTable::hash (string word)
// POST: the index of entry is returned
{ int sum = 0;
for (int k = 0; k < word.length(); k++)
sum = sum + int(word[k]);
return sum % SIZE;
}
Second:
第二:
int HashTable::hash (string word)
{
int seed = 131;
unsigned long hash = 0;
for(int i = 0; i < word.length(); i++)
{
hash = (hash * seed) + word[i];
}
return hash % SIZE;
}
Where SIZE is 501 (The size of the hash table) and the input is coming from a text file of 20,000+ words.
其中 SIZE 是 501(哈希表的大小),输入来自 20,000 多个单词的文本文件。
I saw thisquestion with a few code examples but wasn't exactly sure what to be looking for in a hash function. If I understand correctly, in my case, a hash takes an input (string) and does a math calculation to assign the string a number and inserts it in a table. This process is done to increase the speed of searching the list?
我在一些代码示例中看到了这个问题,但不确定要在散列函数中寻找什么。如果我理解正确,在我的情况下,散列接受一个输入(字符串)并进行数学计算来为字符串分配一个数字并将其插入表中。做这个过程是为了提高搜索列表的速度?
If my logic is sound, does anyone have a good example or a resource showing a different hash function that involves a string? Or even the process of writing my own efficient hash function.
如果我的逻辑是合理的,有没有人有一个很好的例子或资源来展示涉及字符串的不同散列函数?甚至是编写我自己的高效哈希函数的过程。
回答by Basile Starynkevitch
First, it usually does not matter that much in practice. Most hash functions are "good enough".
首先,在实践中通常没有那么重要。大多数哈希函数“足够好”。
But if you really care, you should know that it is a research subject by itself. There are thousand of papers about that. You can still get a PhD today by studying & designing hashing algorithms.
但如果你真的关心,你应该知道它本身就是一个研究课题。关于这一点的论文有数千篇。今天你仍然可以通过研究和设计散列算法来获得博士学位。
Your second hash function might be slightly better, because it probably should separate the string "ab"
from the string "ba"
. On the other hand, it is probably less quick than the first hash function. It may, or may not, be relevant for your application.
您的第二个哈希函数可能会稍微好一些,因为它可能应该将 string"ab"
与 string分开"ba"
。另一方面,它可能不如第一个哈希函数快。它可能与您的应用程序相关,也可能不相关。
I'll guess that hash functions used for genome strings are quite different than those used to hash family names in telephone databases. Perhaps even some string hash functions are better suited for German, than for English or French words.
我猜想用于基因组字符串的散列函数与用于散列电话数据库中姓氏的散列函数大不相同。也许甚至一些字符串哈希函数更适合德语,而不是英语或法语单词。
Many software libraries give you good enough hash functions, e.g. Qt has qhash, and C++11 has std::hashin <functional>
, Glib has several hash functionsin C, and POCOhas some hashfunction.
许多软件库,给你足够好的散列函数,例如Qt拥有qhash,和C ++ 11的std ::散在<functional>
,油嘴有几个哈希函数的C和POCO有一定的散列函数。
I quite often have hashing functions involving primes (see Bézout's identity) and xor, like e.g.
我经常有涉及素数(参见Bézout 的身份)和异或的散列函数,例如
#define A 54059 /* a prime */
#define B 76963 /* another prime */
#define C 86969 /* yet another prime */
#define FIRSTH 37 /* also prime */
unsigned hash_str(const char* s)
{
unsigned h = FIRSTH;
while (*s) {
h = (h * A) ^ (s[0] * B);
s++;
}
return h; // or return h % C;
}
But I don't claim to be an hash expert. Of course, the values of A
, B
, C
, FIRSTH
should preferably be primes, but you could have chosen other prime numbers.
但我并不声称自己是哈希专家。当然,值A
,B
,C
,FIRSTH
最好是素数,但你可以选择其他的素数。
Look at some MD5implementation to get a feeling of what hash functions can be.
查看一些MD5实现以了解散列函数可以是什么。
Most good books on algorithmics have at least a whole chapter dedicated to hashing. Start with wikipages on hash function& hash table.
回答by esskar
-- The way to go these days --
——这些天要走的路——
Use SipHash. For your own protection.
使用SipHash。为了你自己的保护。
-- Old and Dangerous --
-- 古老而危险 --
unsigned int RSHash(const std::string& str)
{
unsigned int b = 378551;
unsigned int a = 63689;
unsigned int hash = 0;
for(std::size_t i = 0; i < str.length(); i++)
{
hash = hash * a + str[i];
a = a * b;
}
return (hash & 0x7FFFFFFF);
}
unsigned int JSHash(const std::string& str)
{
unsigned int hash = 1315423911;
for(std::size_t i = 0; i < str.length(); i++)
{
hash ^= ((hash << 5) + str[i] + (hash >> 2));
}
return (hash & 0x7FFFFFFF);
}
Ask google for "general purpose hash function"
向谷歌询问“通用哈希函数”
回答by Evan Dark
Hash functions for algorithmic use have usually 2 goals, first they have to be fast, second they have to evenly distibute the values across the possible numbers. The hash function also required to give the all same number for the same input value.
用于算法的散列函数通常有两个目标,首先它们必须很快,其次它们必须在可能的数字之间均匀分布值。哈希函数还需要为相同的输入值提供所有相同的数字。
if your values are strings, here are some examples for bad hash functions:
如果您的值是字符串,以下是一些错误哈希函数的示例:
string[0]
- the ASCII characters a-Z are way more often then othersstring.lengh()
- the most probable value is 1
string[0]
- ASCII 字符 aZ 比其他字符更常见string.lengh()
- 最可能的值为 1
Good hash functions tries to use every bit of the input while keeping the calculation time minimal. If you only need some hash code, try to multiply the bytes with prime numbers, and sum them.
好的散列函数会尝试使用输入的每一位,同时保持最小的计算时间。如果您只需要一些哈希码,请尝试将字节与素数相乘,然后将它们相加。
回答by Denise Skidmore
Use boost::hash
#include <boost\functional\hash.hpp>
...
...
std::string a = "ABCDE";
size_t b = boost::hash_value(a);
回答by Brendan Long
Java's String
implements hashCode like this:
Java 的String
hashCode 是这样实现的:
public int hashCode()
Returns a hash code for this string. The hash code for a String object is computed as
s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]
using int arithmetic, where s[i] is the ith character of the string, n is the length of the string, and ^ indicates exponentiation. (The hash value of the empty string is zero.)
So something like this:
所以像这样:
int HashTable::hash (string word) {
int result = 0;
for(size_t i = 0; i < word.length(); ++i) {
result += word[i] * pow(31, i);
}
return result;
}