C#中字符串的快速散列函数

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/9545619/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-09 07:49:34  来源:igfitidea点击:

A fast hash function for string in C#

c#stringperformancehash

提问by P basak

I want to hash a string of length up-to 30. What will be the best idea to do that if time is my concern. The function will be called over 100 million times. currently I am using the following code,

我想散列长度不超过 30 的字符串。如果我关心时间,那么最好的主意是什么。该函数将被调用超过 1 亿次。目前我正在使用以下代码,

static UInt64 CalculateHash(string read, bool lowTolerance)
{
    UInt64 hashedValue = 0;
    int i = 0;
    while (i < read.Length)
    {
        hashedValue += read.ElementAt(i) * (UInt64)Math.Pow(31, i);
        if (lowTolerance) i += 2;
        else i++;
    }
    return hashedValue;
}

采纳答案by David Schwartz

static UInt64 CalculateHash(string read)
{
    UInt64 hashedValue = 3074457345618258791ul;
    for(int i=0; i<read.Length; i++)
    {
        hashedValue += read[i];
        hashedValue *= 3074457345618258799ul;
    }
    return hashedValue;
}

This is a Knuth hash. You can also use Jenkins.

这是一个 Knuth 哈希。你也可以使用詹金斯

回答by skub

I have played with Paul Hsieh's implementations, and seem to be fast with little collisions (for my scenarios anyway)

我玩过 Paul Hsieh 的实现,并且似乎速度很快,几乎没有碰撞(无论如何,对于我的场景)

回答by dasblinkenlight

To speed up your implementation, the (UInt64)Math.Pow(31, i)call should be replaced by a lookup: pre-calculate a table of the first 30 powers of 31, and use it at runtime. Since the limit on length is 30, you need only 31 element:

为了加速您的实现,该(UInt64)Math.Pow(31, i)调用应该被替换为查找:预先计算 的前 30 次幂的表31,并在运行时使用它。由于长度限制为 30,因此您只需要 31 个元素:

private static unsigned long[] Pow31 = new unsigned long[31];

static HashCalc() {
    Pow31[0] = 1;
    for (int i = 1 ; i != Pow31.Length ; i++) {
        Pow31[i] = 31*Pow31[i-1];
    }
}

// In your hash function...
hashedValue += read.ElementAt(i) * Pow31[i];

回答by CodesInChaos

First of all, consider using GetHashCode().

首先,考虑使用GetHashCode().

A simple improvement on your existing implementation:

对现有实现的简单改进:

static UInt64 CalculateHash(string read, bool lowTolerance)
{
    UInt64 hashedValue = 0;
    int i = 0;
    ulong multiplier = 1;
    while (i < read.Length)
    {
        hashedValue += read[i] * multiplier;
        multiplier *= 37;
        if (lowTolerance) i += 2;
        else i++;
    }
    return hashedValue;
}

It avoids the expensive floating point calculation, and the overhead of ElementAt.

它避免了昂贵的浮点计算和ElementAt.

Btw (UInt64)Math.Pow(31, i)doesn't work well for longer strings. Floating point rounding will lead to a multiplier of 0 for characters beyond 15 or so.

顺便说一句(UInt64)Math.Pow(31, i),对于较长的字符串,效果不佳。对于超过 15 个左右的字符,浮点舍入将导致乘数为 0。