Python 如何将字符串散列成 8 位数字?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/16008670/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 21:35:09  来源:igfitidea点击:

How to hash a string into 8 digits?

pythonarraysalgorithmrandomhash

提问by dorafmon

Is there anyway that I can hash a random string into a 8 digit number without implementing any algorithms myself?

无论如何,我可以将一个随机字符串散列成一个 8 位数字,而无需自己实现任何算法?

采纳答案by Raymond Hettinger

Yes, you can use the built-in hashlibmodules or the built-in hashfunction. Then, chop-off the last eight digits using modulo operations or string slicing operations on the integer form of the hash:

是的,您可以使用内置的hashlib模块或内置的哈希函数。然后,对散列的整数形式使用模运算或字符串切片运算截断最后八位数字:

>>> s = 'she sells sea shells by the sea shore'

>>> # Use hashlib
>>> import hashlib
>>> int(hashlib.sha1(s).hexdigest(), 16) % (10 ** 8)
58097614L

>>> # Use hash()
>>> abs(hash(s)) % (10 ** 8)
82148974

回答by JJC

Raymond's answer is great for python2 (though, you don't need the abs() nor the parens around 10 ** 8). However, for python3, there are important caveats. First, you'll need to make sure you are passing an encoded string. These days, in most circumstances, it's probably also better to shy away from sha-1 and use something like sha-256, instead. So, the hashlib approach would be:

Raymond 的回答非常适合 python2(不过,您不需要 abs() 或 10 ** 8 左右的括号)。但是,对于 python3,有一些重要的警告。首先,您需要确保传递的是一个编码字符串。现在,在大多数情况下,最好避免使用 sha-1 并使用 sha-256 之类的东西。因此,hashlib 方法将是:

>>> import hashlib
>>> s = 'your string'
>>> int(hashlib.sha256(s.encode('utf-8')).hexdigest(), 16) % 10**8
80262417

If you want to use the hash() function instead, the important caveat is that, unlike in Python 2.x, in Python 3.x, the result of hash() will only be consistent within a process, not across python invocations. See here:

如果你想改用 hash() 函数,重要的警告是,与 Python 2.x 不同,在 Python 3.x 中, hash() 的结果只会在一个进程内保持一致,而不是跨 python 调用。看这里:

$ python -V
Python 2.7.5
$ python -c 'print(hash("foo"))'
-4177197833195190597
$ python -c 'print(hash("foo"))'
-4177197833195190597

$ python3 -V
Python 3.4.2
$ python3 -c 'print(hash("foo"))'
5790391865899772265
$ python3 -c 'print(hash("foo"))'
-8152690834165248934

This means the hash()-based solution suggested, which can be shortened to just:

这意味着建议基于 hash() 的解决方案,可以缩短为:

hash(s) % 10**8

hash(s) % 10**8

will only return the same value within a given script run:

只会在给定的脚本运行中返回相同的值:

#Python 2:
$ python2 -c 's="your string"; print(hash(s) % 10**8)'
52304543
$ python2 -c 's="your string"; print(hash(s) % 10**8)'
52304543

#Python 3:
$ python3 -c 's="your string"; print(hash(s) % 10**8)'
12954124
$ python3 -c 's="your string"; print(hash(s) % 10**8)'
32065451

So, depending on if this matters in your application (it did in mine), you'll probably want to stick to the hashlib-based approach.

因此,根据这在您的应用程序中是否重要(在我的应用程序中确实如此),您可能希望坚持使用基于 hashlib 的方法。

回答by user8948052

Just to complete JJC answer, in python 3.5.3 the behavior is correct if you use hashlib this way:

只是为了完成 JJC 答案,在 python 3.5.3 中,如果您以这种方式使用 hashlib,则行为是正确的:

$ python3 -c '
import hashlib
hash_object = hashlib.sha256(b"Caroline")
hex_dig = hash_object.hexdigest()
print(hex_dig)
'
739061d73d65dcdeb755aa28da4fea16a02b9c99b4c2735f2ebfa016f3e7fded
$ python3 -c '
import hashlib
hash_object = hashlib.sha256(b"Caroline")
hex_dig = hash_object.hexdigest()
print(hex_dig)
'
739061d73d65dcdeb755aa28da4fea16a02b9c99b4c2735f2ebfa016f3e7fded

$ python3 -V
Python 3.5.3

回答by user 923227

I am sharing our nodejs implementation of the solution as implemented by @Raymond Hettinger.

我正在分享由@Raymond Hettinger 实施的解决方案的 nodejs 实现。

var crypto = require('crypto');
var s = 'she sells sea shells by the sea shore';
console.log(BigInt('0x' + crypto.createHash('sha1').update(s).digest('hex'))%(10n ** 8n));