Javascript 在Javascript中从字符串生成哈希

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/7616461/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-24 02:55:57  来源:igfitidea点击:

Generate a Hash from string in Javascript

javascripthash

提问by Freesn?w

I need to convert strings to some form of hash. Is this possible in JavaScript?

我需要将字符串转换为某种形式的哈希。这在 JavaScript 中可能吗?

I'm not utilizing a server-side language so I can't do it that way.

我没有使用服务器端语言,所以我不能那样做。

回答by esmiralha

Object.defineProperty(String.prototype, 'hashCode', {
  value: function() {
    var hash = 0, i, chr;
    for (i = 0; i < this.length; i++) {
      chr   = this.charCodeAt(i);
      hash  = ((hash << 5) - hash) + chr;
      hash |= 0; // Convert to 32bit integer
    }
    return hash;
  }
});

Source: http://werxltd.com/wp/2010/05/13/javascript-implementation-of-javas-string-hashcode-method/

来源:http: //werxltd.com/wp/2010/05/13/javascript-implementation-of-javas-string-hashcode-method/

回答by lordvlad

EDIT

编辑

based on my jsperf tests, the accepted answer is actually faster: http://jsperf.com/hashcodelordvlad

根据我的 jsperf 测试,接受的答案实际上更快:http://jsperf.com/hashcodelordvlad

ORIGINAL

原来的

if anyone is interested, here is an improved ( faster ) version, which will fail on older browsers who lack the reducearray function.

如果有人感兴趣,这里有一个改进的(更快)版本,它会在缺少reduce数组功能的旧浏览器上失败。

hashCode = function(s){
  return s.split("").reduce(function(a,b){a=((a<<5)-a)+b.charCodeAt(0);return a&a},0);              
}

one-liner arrow function version :

单线箭头函数版本:

hashCode = s => s.split('').reduce((a,b)=>{a=((a<<5)-a)+b.charCodeAt(0);return a&a},0)

回答by mar10

Note:Even with the best 32-bit hash, collisions willoccur sooner or later.

The hash collision probablility can be calculated as 1 - e ^ (-k(k-1) / 2N, aproximated as k^2 / 2N(see here). This may be higher than intuition suggests:
Assuming a 32-bit hash and k=10,000 items, a collision will occur with a probablility of 1.2%. For 77,163 samples the probability becomes 50%! (calculator).
I suggest a workaround at the bottom.

注意:即使使用最好的 32 位哈希,冲突迟早发生。

散列冲突概率可以计算为 1 - e ^ (-k(k-1) / 2N,近似为 k^2 / 2N见这里)。这可能比直觉所暗示的要高:
假设一个 32 位哈希和 k=10,000 个项目,发生碰撞的概率为 1.2%。对于 77,163 个样本,概率变为 50%!(计算器)。
我建议底部的解决方法。

In an answer to this question Which hashing algorithm is best for uniqueness and speed?, Ian Boyd posted a good in depth analysis. In short (as I interpret it), he comes to the conclusion that Murmur is best, followed by FNV-1a.
Java's String.hashCode() algorithm that esmiralha proposed seems to be a variant of DJB2.

在回答这个问题时 哪种哈希算法最适合唯一性和速度?, Ian Boyd 发表了一篇很好的深入分析。简而言之(按照我的解释),他得出的结论是 Murmur 最好,其次是 FNV-1a。
esmiralha 提出的 Java 的 String.hashCode() 算法似乎是 DJB2 的变体。

  • FNV-1a has a a better distribution than DJB2, but is slower
  • DJB2 is faster than FNV-1a, but tends to yield more collisions
  • MurmurHash3 is better and faster than DJB2 and FNV-1a (but the optimized implementation requires more lines of code than FNV and DJB2)
  • FNV-1a 具有比 DJB2 更好的分布,但速度较慢
  • DJB2 比 FNV-1a 快,但往往会产生更多的碰撞
  • MurmurHash3 比 DJB2 和 FNV-1a 更好更快(但优化的实现需要比 FNV 和 DJB2 多行代码)

Some benchmarks with large input strings here: http://jsperf.com/32-bit-hash
When shortinput strings are hashed, murmur's performance drops, relative to DJ2B and FNV-1a: http://jsperf.com/32-bit-hash/3

一些带有大输入字符串的基准测试:http: //jsperf.com/32-bit-hash
当对输入字符串进行哈希处理时,murmur 的性能相对于 DJ2B 和 FNV-1a 会下降:http://jsperf.com/32-位哈希/3

So in general I would recommend murmur3.
See here for a JavaScript implementation: https://github.com/garycourt/murmurhash-js

所以总的来说,我会推荐 murmur3。
有关 JavaScript 实现,请参见此处:https: //github.com/garycourt/murmurhash-js

If input strings are short and performance is more important than distribution quality, use DJB2 (as proposed by the accepted answer by esmiralha).

如果输入字符串很短并且性能比分发质量更重要,请使用 DJB2(如 esmiralha 接受的答案所建议的那样)。

If quality and small code size are more important than speed, I use this implementation of FNV-1a (based on this code).

如果质量和小代码量比速度更重要,我使用 FNV-1a 的这个实现(基于这个代码)。

/**
 * Calculate a 32 bit FNV-1a hash
 * Found here: https://gist.github.com/vaiorabbit/5657561
 * Ref.: http://isthe.com/chongo/tech/comp/fnv/
 *
 * @param {string} str the input value
 * @param {boolean} [asString=false] set to true to return the hash value as 
 *     8-digit hex string instead of an integer
 * @param {integer} [seed] optionally pass the hash of the previous chunk
 * @returns {integer | string}
 */
function hashFnv32a(str, asString, seed) {
    /*jshint bitwise:false */
    var i, l,
        hval = (seed === undefined) ? 0x811c9dc5 : seed;

    for (i = 0, l = str.length; i < l; i++) {
        hval ^= str.charCodeAt(i);
        hval += (hval << 1) + (hval << 4) + (hval << 7) + (hval << 8) + (hval << 24);
    }
    if( asString ){
        // Convert to 8 digit hex string
        return ("0000000" + (hval >>> 0).toString(16)).substr(-8);
    }
    return hval >>> 0;
}

Improve Collision Probability

提高碰撞概率

As explained here, we can extend the hash bit size using this trick:

如此处所述,我们可以使用以下技巧扩展哈希位大小:

function hash64(str) {
    var h1 = hash32(str);  // returns 32 bit (as 8 byte hex string)
    return h1 + hash32(h1 + str);  // 64 bit (as 16 byte hex string)
}

Use it with care and don't expect too much though.

小心使用它,但不要期望太多。

回答by deekshith

Based on accepted answerin ES6. Smaller, maintainable and works in modern browsers.

基于ES6 中接受的答案。更小、可维护且适用于现代浏览器。

function hashCode(str) {
  return str.split('').reduce((prevHash, currVal) =>
    (((prevHash << 5) - prevHash) + currVal.charCodeAt(0))|0, 0);
}

// Test
console.log("hashCode(\"Hello!\"): ", hashCode('Hello!'));

EDIT (2019-11-04):

编辑(2019-11-04)

one-liner arrow function version :

单线箭头函数版本:

const hashCode = s => s.split('').reduce((a,b) => (((a << 5) - a) + b.charCodeAt(0))|0, 0)

// test
console.log(hashCode('Hello!'))

回答by bryc

Almost half the answers are implementations of Java's String.hashCode, which is neither high quality nor super fast. It's nothing too special, it just multiples by 31 for each character. It can be implemented simply and efficiently in one line, and is much faster with Math.imul:

几乎一半的答案是 Java 的实现String.hashCode,它既不高质量也不超快。没什么特别的,只是每个字符乘以 31。它可以在一行中简单有效地实现,并且速度更快Math.imul

hashCode=s=>{for(var i=0,h;i<s.length;i++)h=Math.imul(31,h)+s.charCodeAt(i)|0;return h}


With that out of the way, here's something better—cyrb53, a simple but high quality 53-bit hash. It's quite fast, provides very good hash distribution, and has significantly lower collision rates compared to any32-bit hash.

顺便说一下,这里有更好的东西—— cyrb53,一个简单但高质量的 53 位哈希。它非常快,提供了非常好的散列分布,并且与任何32 位散列相比具有明显更低的冲突率。

const cyrb53 = function(str, seed = 0) {
    let h1 = 0xdeadbeef ^ seed, h2 = 0x41c6ce57 ^ seed;
    for (let i = 0, ch; i < str.length; i++) {
        ch = str.charCodeAt(i);
        h1 = Math.imul(h1 ^ ch, 2654435761);
        h2 = Math.imul(h2 ^ ch, 1597334677);
    }
    h1 = Math.imul(h1 ^ h1>>>16, 2246822507) ^ Math.imul(h2 ^ h2>>>13, 3266489909);
    h2 = Math.imul(h2 ^ h2>>>16, 2246822507) ^ Math.imul(h1 ^ h1>>>13, 3266489909);
    return 4294967296 * (2097151 & h2) + (h1>>>0);
};

Similar to the well-known MurmurHash/xxHash algorithms, it uses a combination of multiplication and Xorshiftto generate the hash, but not as thorough. As a result it's faster than either in JavaScript and significantly simpler to implement.

类似于众所周知的 MurmurHash/xxHash 算法,它使用乘法和Xorshift的组合来生成哈希,但没有那么彻底。因此,它比 JavaScript 中的任何一个都快,并且实现起来要简单得多。

It achieves avalanche (non-strict), which basically means small changes in the input have big changes in the output, making the resulting hash appear random:

它实现了雪崩(非严格),这基本上意味着输入的微小变化在输出中会有很大的变化,使得得到的散列看起来是随机的:

0xc2ba782c97901 = cyrb53("a")
0xeda5bc254d2bf = cyrb53("b")
0xe64cc3b748385 = cyrb53("revenge")
0xd85148d13f93a = cyrb53("revenue")

You can also supply a seed for alternate streams of the same input:

您还可以为相同输入的备用流提供种子:

0xee5e6598ccd5c = cyrb53("revenue", 1)
0x72e2831253862 = cyrb53("revenue", 2)
0x0de31708e6ab7 = cyrb53("revenue", 3)


Technically it's a 64-bit hash (two uncorrelated 32-bit hashes in parallel), but JavaScript is limited to 53-bit integers. If required, The full 64-bit output can still be usedby altering the return line for a hex string or array.

从技术上讲,它是一个 64 位哈希(两个不相关的 32 位哈希并行),但 JavaScript 仅限于 53 位整数。如果需要,仍然可以通过更改十六进制字符串或数组的返回行来使用完整的 64 位输出

Be aware that constructing hex strings can drastically slow down batch processing in performance-critical situations.

请注意,在性能关键的情况下,构造十六进制字符串会大大减慢批处理速度。

return (h2>>>0).toString(16).padStart(8,0)+(h1>>>0).toString(16).padStart(8,0);
// or
return [h2>>>0, h1>>>0];


And just for fun, here's a minimal 32-bit hash in 89 charswith higher quality than even FNV or DJB2:

只是为了好玩,这里有一个89 个字符的最小 32 位散列,其质量甚至比 FNV 或 DJB2 还要高:

TSH=s=>{for(var i=0,h=9;i<s.length;)h=Math.imul(h^s.charCodeAt(i++),9**9);return h^h>>>9}

回答by Kyle Falconer

If it helps anyone, I combined the top two answers into an older-browser-tolerant version, which uses the fast version if reduceis available and falls back to esmiralha's solution if it's not.

如果它对任何人有帮助,我将前两个答案合并到一个旧的浏览器容忍版本中,如果reduce可用,则使用快速版本,如果不可用,则回退到 esmiralha 的解决方案。

/**
 * @see http://stackoverflow.com/q/7616461/940217
 * @return {number}
 */
String.prototype.hashCode = function(){
    if (Array.prototype.reduce){
        return this.split("").reduce(function(a,b){a=((a<<5)-a)+b.charCodeAt(0);return a&a},0);              
    } 
    var hash = 0;
    if (this.length === 0) return hash;
    for (var i = 0; i < this.length; i++) {
        var character  = this.charCodeAt(i);
        hash  = ((hash<<5)-hash)+character;
        hash = hash & hash; // Convert to 32bit integer
    }
    return hash;
}

Usage is like:

用法如下:

var hash = "some string to be hashed".hashCode();

回答by mmm

This is a refined and better performing variant:

这是一个经过改进且性能更好的变体:

String.prototype.hashCode = function() {
    var hash = 0, i = 0, len = this.length;
    while ( i < len ) {
        hash  = ((hash << 5) - hash + this.charCodeAt(i++)) << 0;
    }
    return hash;
};

This matches Java's implementation of the standard object.hashCode()

这与 Java 对标准的实现相匹配 object.hashCode()

Here is also one that returns only positive hashcodes:

这也是一个只返回正哈希码的方法:

String.prototype.hashcode = function() {
    return (this.hashCode() + 2147483647) + 1;
};

And here is a matching one for Java that only returns positive hashcodes:

这是一个只返回正哈希码的 Java 匹配项:

public static long hashcode(Object obj) {
    return ((long) obj.hashCode()) + Integer.MAX_VALUE + 1l;
}

Enjoy!

享受!

回答by Kaiido

I'm a bit surprised nobody has talked about the new SubtleCrypto APIyet.

我有点惊讶还没有人谈论新的SubtleCrypto API

To get an hash from a string, you can use the subtle.digestmethod :

要从字符串中获取哈希,您可以使用以下subtle.digest方法:

function getHash(str, algo = "SHA-256") {
  let strBuf = new TextEncoder('utf-8').encode(str);
  return crypto.subtle.digest(algo, strBuf)
    .then(hash => {
      window.hash = hash;
      // here hash is an arrayBuffer, 
      // so we'll connvert it to its hex version
      let result = '';
      const view = new DataView(hash);
      for (let i = 0; i < hash.byteLength; i += 4) {
        result += ('00000000' + view.getUint32(i).toString(16)).slice(-8);
      }
      return result;
    });
}

getHash('hello world')
  .then(hash => {
    console.log(hash);
  });

回答by djabraham

Thanks to the example by mar10, I found a way to get the same results in C# AND Javascript for an FNV-1a. If unicode chars are present, the upper portion is discarded for the sake of performance. Don't know why it would be helpful to maintain those when hashing, as am only hashing url paths for now.

多亏了 mar10 的例子,我找到了一种在 C# 和 Javascript 中为 FNV-1a 获得相同结果的方法。如果存在 unicode 字符,则出于性能考虑将丢弃上部。不知道为什么在散列时维护它们会有所帮助,因为我现在只对 url 路径进行散列。

C# Version

C# 版本

private static readonly UInt32 FNV_OFFSET_32 = 0x811c9dc5;   // 2166136261
private static readonly UInt32 FNV_PRIME_32 = 0x1000193;     // 16777619

// Unsigned 32bit integer FNV-1a
public static UInt32 HashFnv32u(this string s)
{
    // byte[] arr = Encoding.UTF8.GetBytes(s);      // 8 bit expanded unicode array
    char[] arr = s.ToCharArray();                   // 16 bit unicode is native .net 

    UInt32 hash = FNV_OFFSET_32;
    for (var i = 0; i < s.Length; i++)
    {
        // Strips unicode bits, only the lower 8 bits of the values are used
        hash = hash ^ unchecked((byte)(arr[i] & 0xFF));
        hash = hash * FNV_PRIME_32;
    }
    return hash;
}

// Signed hash for storing in SQL Server
public static Int32 HashFnv32s(this string s)
{
    return unchecked((int)s.HashFnv32u());
}

JavaScript Version

JavaScript 版本

var utils = utils || {};

utils.FNV_OFFSET_32 = 0x811c9dc5;

utils.hashFnv32a = function (input) {
    var hval = utils.FNV_OFFSET_32;

    // Strips unicode bits, only the lower 8 bits of the values are used
    for (var i = 0; i < input.length; i++) {
        hval = hval ^ (input.charCodeAt(i) & 0xFF);
        hval += (hval << 1) + (hval << 4) + (hval << 7) + (hval << 8) + (hval << 24);
    }

    return hval >>> 0;
}

utils.toHex = function (val) {
    return ("0000000" + (val >>> 0).toString(16)).substr(-8);
}

回答by jcollum

I needed a similar function (but different) to generate a unique-ish ID based on the username and current time. So:

我需要一个类似的函数(但不同)来根据用户名和当前时间生成一个唯一的 ID。所以:

window.newId = ->
  # create a number based on the username
  unless window.userNumber?
    window.userNumber = 0
  for c,i in window.MyNamespace.userName
    char = window.MyNamespace.userName.charCodeAt(i)
    window.MyNamespace.userNumber+=char
  ((window.MyNamespace.userNumber + Math.floor(Math.random() * 1e15) + new Date().getMilliseconds()).toString(36)).toUpperCase()

Produces:

产生:

2DVFXJGEKL
6IZPAKFQFL
ORGOENVMG
... etc 

edit Jun 2015: For new code I use shortid: https://www.npmjs.com/package/shortid

2015 年 6 月编辑:对于新代码,我使用 shortid:https://www.npmjs.com/package/shortid