Bash,如何散列字符串的值?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/28844492/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Bash, how to hash value of a string?
提问by Zombies
I want to simply convert a string of any length to an integer value. Each string will map to a unique or even non-unique integer. Is there any existing opensource command that does this?
我想简单地将任意长度的字符串转换为整数值。每个字符串将映射到一个唯一的甚至非唯一的整数。是否有任何现有的开源命令可以执行此操作?
Bonus points if it is unique, such as computing the lexicographical order via a bash command.
如果它是唯一的,则加分,例如通过 bash 命令计算字典顺序。
回答by rici
You need to be careful about using hash
functions from common programming languages. It has been common to introduce randomized seeds into hash functions, so that hash values are only unique for a single program execution. This avoids a denial-of-service attack noted in oCert advisory 2011-3. (As that advisory notes, the problem was described in 2003 in a paper presented to Usenix.)
使用hash
通用编程语言中的函数时需要小心。在散列函数中引入随机种子是很常见的,因此散列值仅对于单个程序执行是唯一的。这避免了oCert 咨询 2011-3 中提到的拒绝服务攻击。(正如该咨询所指出的,该问题在 2003 年提交给 Usenix 的一篇论文中有所描述。)
For example, the Python hash function has been randomized by default since v3.3:
例如,Python 哈希函数从 v3.3 开始就默认随机化了:
$ python3 -c 'from sys import argv;print(hash(argv[1]))' abc
-2595772619214671013
$ python3 -c 'from sys import argv;print(hash(argv[1]))' abc
-6001956461950650533
$ python3 -c 'from sys import argv;print(hash(argv[1]))' abc
-7414807274805087300
$ python3 -c 'from sys import argv;print(hash(argv[1]))' abc
-327608370992723225
# Python2 generates consistent hash values
$ python -c 'from sys import argv;print(hash(argv[1]))' abc
1453079729188098211
$ python -c 'from sys import argv;print(hash(argv[1]))' abc
1453079729188098211
$ python -c 'from sys import argv;print(hash(argv[1]))' abc
1453079729188098211
You can control hash randomization in Python by setting the PYTHONHASHSEED
environment variable.
您可以通过设置PYTHONHASHSEED
环境变量来控制 Python 中的哈希随机化。
Or you can use a standardized cryptographic hash like SHA-1. The commonly-available sha1sum
utility outputs its result in hexadecimal, but you can convert that to decimal with bash (truncated to 64 bits):
或者您可以使用标准化的加密哈希,如 SHA-1。通用sha1sum
实用程序以十六进制输出其结果,但您可以使用 bash 将其转换为十进制(截断为 64 位):
$ echo $((0x$(sha1sum <<<"string to hash")0))
-7037254581539467098
or in its full 160-bit glory, using bc
(which requires hex to be written in upper-case):
或者在其完整的 160 位荣耀中,使用bc
(需要以大写形式编写十六进制):
$ bc <<<ibase=16\;$(sha1sum <<<"string to hash"|tr a-z A-Z)0
861191872165666513280590001082621748432296579238
If you only need the hash value modulo some power of 16, you can use the first few bytes of the SHA-1 sum. (You could use any selection of bytes -- they're all equally well distributed -- but the first few are easier to extract):
如果您只需要以 16 的某个幂为模的哈希值,则可以使用 SHA-1 和的前几个字节。(您可以使用任何字节选择——它们都同样分布均匀——但前几个更容易提取):
$ echo $((0x$(sha1sum <<<"string to hash"|cut -c1-2)))
150
Note:As @gniourf_gniourfpoints out in a comment, the above doesn't really compute the SHA-1 checksum of the given string because the bash here-string syntax (<<<word
) appends a newline to word
. Since the checksum of the string with a newline appended is just as good a hash as the checksum of the string itself, there is no problem as long as you always use the same mechanism to produce the hash.
注意:正如@gniourf_gniourf在评论中指出的那样,上述内容并没有真正计算给定字符串的 SHA-1 校验和,因为 bash here-string 语法 ( <<<word
) 在word
. 由于附加了换行符的字符串的校验和与字符串本身的校验和一样好,所以只要您始终使用相同的机制来生成哈希,就没有问题。
回答by bishop
You could use the sum
or cksum
command (the latter being preferred) to generate a base-10 integer:
您可以使用sum
orcksum
命令(后者是首选)来生成一个以 10 为底的整数:
$ cksum <<< 'hello world' | cut -f 1 -d ' '
3733384285
$ cksum <<< 'goodbye world' | cut -f 1 -d ' '
2600070097
If you're interested in the math behind these simple hashes, check out the source implementations:
如果您对这些简单哈希背后的数学感兴趣,请查看源实现:
- cksum calculates the AUTODIN II polynomialused by Ethernet
- sum calculates either the 16-bit CRCor the POSIX 1003.2 CRC, depending upon the
-r
and-s
command-line arguments.
- cksum 计算以太网使用的 AUTODIN II 多项式
- sum根据命令行参数和命令行参数计算16 位 CRC或POSIX 1003.2 CRC。
-r
-s