php 非加密用途的最快哈希?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3665247/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-25 10:37:28  来源:igfitidea点击:

Fastest hash for non-cryptographic uses?

phpdatabasesecurityhash

提问by John

I'm essentially preparing phrases to be put into the database, they may be malformed so I want to store a short hash of them instead (I will be simply comparing if they exist or not, so hash is ideal).

我基本上是在准备要放入数据库的短语,它们可能格式不正确,所以我想存储它们的短散列(我将简单地比较它们是否存在,因此散列是理想的)。

I assume MD5 is fairly slow on 100,000+ requests so I wanted to know what would be the best method to hash the phrases, maybe rolling out my own hash function or using hash('md4', '...'would be faster in the end?

我认为 MD5 在 100,000 多个请求上相当慢,所以我想知道散列短语的最佳方法是什么,也许推出我自己的散列函数或hash('md4', '...'最终使用会更快?

I know MySQL has MD5(), so that would complement a bit of speed on the query end, but maybe there's further a faster hashing function in MySQL I don't know about that would work with PHP..

我知道 MySQL 有 MD5(),所以这会在查询端增加一点速度,但也许 MySQL 中还有一个更快的散列函数,我不知道它可以与 PHP 一起使用。

采纳答案by joschi

CRC32 is pretty fast and there's a function for it: http://www.php.net/manual/en/function.crc32.php

CRC32 非常快,并且有一个功能:http: //www.php.net/manual/en/function.crc32.php

But you should be aware that CRC32 will have more collisions than MD5 or even SHA-1 hashes, simply because of the reduced length (32 bits compared to 128 bits respectively 160 bits). But if you just want to check whether a stored string is corrupted, you'll be fine with CRC32.

但是你应该知道 CRC32 会比 MD5 甚至 SHA-1 哈希有更多的冲突,这仅仅是因为长度减少了(32 位与 128 位和 160 位相比)。但是,如果您只想检查存储的字符串是否已损坏,那么使用 CRC32 就可以了。

回答by Quamis

fcn     time  generated hash
crc32:  0.03163  798740135
md5:    0.0731   0dbab6d0c841278d33be207f14eeab8b
sha1:   0.07331  417a9e5c9ac7c52e32727cfd25da99eca9339a80
xor:    0.65218  119
xor2:   0.29301  134217728
add:    0.57841  1105

And the code used to generate this is:

用于生成它的代码是:

 $loops = 100000;
 $str = "ana are mere";

 echo "<pre>";

 $tss = microtime(true);
 for($i=0; $i<$loops; $i++){
  $x = crc32($str);
 }
 $tse = microtime(true);
 echo "\ncrc32: \t" . round($tse-$tss, 5) . " \t" . $x;

 $tss = microtime(true);
 for($i=0; $i<$loops; $i++){
  $x = md5($str);
 }
 $tse = microtime(true);
 echo "\nmd5: \t".round($tse-$tss, 5) . " \t" . $x;

 $tss = microtime(true);
 for($i=0; $i<$loops; $i++){
  $x = sha1($str);
 }
 $tse = microtime(true);
 echo "\nsha1: \t".round($tse-$tss, 5) . " \t" . $x;

 $tss = microtime(true);
 for($i=0; $i<$loops; $i++){
  $l = strlen($str);
  $x = 0x77;
  for($j=0;$j<$l;$j++){
   $x = $x xor ord($str[$j]);
  }
 }
 $tse = microtime(true);
 echo "\nxor: \t".round($tse-$tss, 5) . " \t" . $x;

 $tss = microtime(true);
 for($i=0; $i<$loops; $i++){
  $l = strlen($str);
  $x = 0x08;
  for($j=0;$j<$l;$j++){
   $x = ($x<<2) xor $str[$j];
  }
 }
 $tse = microtime(true);
 echo "\nxor2: \t".round($tse-$tss, 5) . " \t" . $x;

 $tss = microtime(true);
 for($i=0; $i<$loops; $i++){
  $l = strlen($str);
  $x = 0;
  for($j=0;$j<$l;$j++){
   $x = $x + ord($str[$j]);
  }
 }
 $tse = microtime(true);
 echo "\nadd: \t".round($tse-$tss, 5) . " \t" . $x;

回答by Pez Cuckow

Ranked list where each loop shares the same thing to crypt as all the others.

排名列表,其中每个循环与所有其他循环共享相同的东西来加密。

<?php

set_time_limit(720);

$begin = startTime();
$scores = array();


foreach(hash_algos() as $algo) {
    $scores[$algo] = 0;
}

for($i=0;$i<10000;$i++) {
    $number = rand()*100000000000000;
    $string = randomString(500);

    foreach(hash_algos() as $algo) {
        $start = startTime();

        hash($algo, $number); //Number
        hash($algo, $string); //String

        $end = endTime($start);

        $scores[$algo] += $end;
    }   
}


asort($scores);

$i=1;
foreach($scores as $alg => $time) {
    print $i.' - '.$alg.' '.$time.'<br />';
    $i++;
}

echo "Entire page took ".endTime($begin).' seconds<br />';

echo "<br /><br /><h2>Hashes Compared</h2>";

foreach($scores as $alg => $time) {
    print $i.' - '.$alg.' '.hash($alg,$string).'<br />';
    $i++;
}

function startTime() {
   $mtime = microtime(); 
   $mtime = explode(" ",$mtime); 
   $mtime = $mtime[1] + $mtime[0]; 
   return $mtime;   
}

function endTime($starttime) {
   $mtime = microtime(); 
   $mtime = explode(" ",$mtime); 
   $mtime = $mtime[1] + $mtime[0]; 
   $endtime = $mtime; 
   return $totaltime = ($endtime - $starttime); 
}

function randomString($length) {
    $characters = '0123456789abcdefghijklmnopqrstuvwxyz';
    $string = '';    
    for ($p = 0; $p < $length; $p++) {
        $string .= $characters[mt_rand(0, strlen($characters) - 1)];
    }
    return $string;
}

?>

And the output

和输出

1 - crc32b 0.111036300659
2 - crc32 0.112048864365
3 - md4 0.120795726776
4 - md5 0.138875722885
5 - sha1 0.146368741989
6 - adler32 0.15501332283
7 - tiger192,3 0.177447080612
8 - tiger160,3 0.179498195648
9 - tiger128,3 0.184012889862
10 - ripemd128 0.184052705765
11 - ripemd256 0.185411214828
12 - salsa20 0.198500156403
13 - salsa10 0.204956293106
14 - haval160,3 0.206098556519
15 - haval256,3 0.206891775131
16 - haval224,3 0.206954240799
17 - ripemd160 0.207638263702
18 - tiger192,4 0.208125829697
19 - tiger160,4 0.208438634872
20 - tiger128,4 0.209359407425
21 - haval128,3 0.210256814957
22 - sha256 0.212738037109
23 - ripemd320 0.215386390686
24 - haval192,3 0.215610980988
25 - sha224 0.218329429626
26 - haval192,4 0.256464719772
27 - haval160,4 0.256565093994
28 - haval128,4 0.257113456726
29 - haval224,4 0.258928537369
30 - haval256,4 0.259262084961
31 - haval192,5 0.288433790207
32 - haval160,5 0.290239810944
33 - haval256,5 0.291721343994
34 - haval224,5 0.294484138489
35 - haval128,5 0.300224781036
36 - sha384 0.352449893951
37 - sha512 0.354603528976
38 - gost 0.392376661301
39 - whirlpool 0.629067659378
40 - snefru256 0.829529047012
41 - snefru 0.833986997604
42 - md2 1.80192279816
Entire page took 22.755341053 seconds


Hashes Compared

1 - crc32b 761331d7
2 - crc32 7e8c6d34
3 - md4 1bc8785de173e77ef28a24bd525beb68
4 - md5 9f9cfa3b5b339773b8d6dd77bbe931dd
5 - sha1 ca2bd798e47eab85655f0ce03fa46b2e6e20a31f
6 - adler32 f5f2aefc
7 - tiger192,3 d11b7615af06779259b29446948389c31d896dee25edfc50
8 - tiger160,3 d11b7615af06779259b29446948389c31d896dee
9 - tiger128,3 d11b7615af06779259b29446948389c3
10 - ripemd128 5f221a4574a072bc71518d150ae907c8
11 - ripemd256 bc89cd79f4e70b73fbb4faaf47a3caf263baa07e72dd435a0f62afe840f5c71c
12 - salsa20 91d9b963e172988a8fc2c5ff1a8d67073b2c5a09573cb03e901615dc1ea5162640f607e0d7134c981eedb761934cd8200fe90642a4608eacb82143e6e7b822c4
13 - salsa10 320b8cb8498d590ca2ec552008f1e55486116257a1e933d10d35c85a967f4a89c52158f755f775cd0b147ec64cde8934bae1e13bea81b8a4a55ac2c08efff4ce
14 - haval160,3 27ad6dd290161b883e614015b574b109233c7c0e
15 - haval256,3 03706dd2be7b1888bf9f3b151145b009859a720e3fe921a575e11be801c54c9a
16 - haval224,3 16706dd2c77b1888c29f3b151745b009879a720e4fe921a576e11be8
17 - ripemd160 f419c7c997a10aaf2d83a5fa03c58350d9f9d2e4
18 - tiger192,4 112f486d3a9000f822c050a204d284d52473f267b1247dbd
19 - tiger160,4 112f486d3a9000f822c050a204d284d52473f267
20 - tiger128,4 112f486d3a9000f822c050a204d284d5
21 - haval128,3 9d9155d430218e4dcdde1c62962ecca3
22 - sha256 6027f87b4dd4c732758aa52049257f9e9db7244f78c132d36d47f9033b5c3b09
23 - ripemd320 9ac00db553b51662826267daced37abfccca6433844f67d8f8cfd243cf78bbbf86839daf0961b61d
24 - haval192,3 7d706dd2d37c1888eaa53b154948b009e09c720effed21a5
25 - sha224 b6395266d8c7e40edde77969359e6a5d725f322e2ea4bd73d3d25768
26 - haval192,4 d87cd76e4c8006d401d7068dce5dec3d02dfa037d196ea14
27 - haval160,4 f2ddd76e156d0cd40eec0b8d09c8f23d0f47a437
28 - haval128,4 f066e6312b91e7ef69f26b2adbeba875
29 - haval224,4 1b7cd76ea97c06d439d6068d7d56ec3d73dba0373895ea14e465bc0e
30 - haval256,4 157cd76e8b7c06d432d6068d7556ec3d66dba0371c95ea14e165bc0ec31b9d37
31 - haval192,5 05f9ea219ae1b98ba33bac6b37ccfe2f248511046c80c2f0
32 - haval160,5 e054ec218637bc8b4bf1b26b2fb40230e0161904
33 - haval256,5 48f6ea210ee1b98be835ac6b7dc4fe2f39841104a37cc2f06ceb2bf58ab4fe78
34 - haval224,5 57f6ea2111e1b98bf735ac6b92c4fe2f43841104ab7cc2f076eb2bf5
35 - haval128,5 ccb8e0ac1fd12640ecd8976ab6402aa8
36 - sha384 bcf0eeaa1479bf6bef7ece0f5d7111c3aeee177aa7990926c633891464534cd8a6c69d905c36e882b3350ef40816ed02
37 - sha512 8def9a1e6e31423ef73c94251d7553f6fe3ed262c44e852bdb43e3e2a2b76254b4da5ef25aefb32aae260bb386cd133045adfa2024b067c2990b60d6f014e039
38 - gost ef6cb990b754b1d6a428f6bb5c113ee22cc9533558d203161441933d86e3b6f8
39 - whirlpool 54eb1d0667b6fdf97c01e005ac1febfacf8704da55c70f10f812b34cd9d45528b60d20f08765ced0ab3086d2bde312259aebf15d105318ae76995c4cf9a1e981
40 - snefru256 20849cbeda5ddec5043c09d36b2de4ba0ea9296b6c9efaa7c7257f30f351aea4
41 - snefru 20849cbeda5ddec5043c09d36b2de4ba0ea9296b6c9efaa7c7257f30f351aea4
42 - md2 d4864c8c95786480d1cf821f690753dc

回答by hdante

There's a speed comparison on xxhash site. Copy pasting it here:

xxhash 网站上有一个速度比较。复制粘贴到这里:

 Name            Speed       Q.Score   Author
 xxHash          5.4 GB/s     10
 MumurHash 3a    2.7 GB/s     10       Austin Appleby
 SpookyHash      2.0 GB/s     10       Bob Jenkins
 SBox            1.4 GB/s      9       Bret Mulvey
 Lookup3         1.2 GB/s      9       Bob Jenkins
 CityHash64      1.05 GB/s    10       Pike & Alakuijala
 FNV             0.55 GB/s     5       Fowler, Noll, Vo
 CRC32           0.43 GB/s     9
 MD5-32          0.33 GB/s    10       Ronald L. Rivest
 SHA1-32         0.28 GB/s    10

So it seems xxHash is by far the fastest one, while many others beat older hashes, like CRC32, MD5 and SHA.

所以看起来 xxHash 是迄今为止最快的一种,而许多其他算法都击败了旧的哈希,比如 CRC32、MD5 和 SHA。

https://code.google.com/p/xxhash/

https://code.google.com/p/xxhash/

Note that this is the ordering on a 32-bit compilation. On a 64-bit compilation the performance order is likely very different. Some of the hashes are heavily based on 64-bit multiplications and fetches.

请注意,这是 32 位编译的顺序。在 64 位编译中,性能顺序可能非常不同。一些散列在很大程度上基于 64 位乘法和提取。

回答by Aalex Gabi

+-------------------+---------+------+--------------+
|       NAME        |  LOOPS  | TIME |     OP/S     |
+-------------------+---------+------+--------------+
| sha1ShortString   | 1638400 | 2.85 | 574,877.19   |
| md5ShortString    | 2777680 | 4.11 | 675,834.55   |
| crc32ShortString  | 3847980 | 3.61 | 1,065,922.44 |
| sha1MediumString  | 602620  | 4.75 | 126,867.37   |
| md5MediumString   | 884860  | 4.69 | 188,669.51   |
| crc32MediumString | 819200  | 4.85 | 168,907.22   |
| sha1LongString    | 181800  | 4.95 | 36,727.27    |
| md5LongString     | 281680  | 4.93 | 57,135.90    |
| crc32LongString   | 226220  | 4.95 | 45,701.01    |
+-------------------+---------+------+--------------+

It seems that crc32 is faster for small messages(in this case 26 characters) while md5 for longer messages(in this case >852 characters).

似乎 crc32 对于小消息(在本例中为 26 个字符)更快,而 md5 对于更长的消息(在本例中为 >852 个字符)。

回答by user5994461

2019 update: This answer is the most up to date. Libraries to support murmur are largely available for all languages.

2019 年更新:此答案是最新的。支持 murmur 的库在很大程度上适用于所有语言。

The current recommendation is to use the Murmur Hash Family(see specifically the murmur2or murmur3variants).

当前的建议是使用Murmur Hash 系列(具体参见murmur2murmur3变体)。

Murmur hashes were designed for fast hashing with minimal collisions (much faster than CRC, MDx and SHAx). It's perfect to look for duplicates and very appropriate for HashTable indexes.

Murmur 散列旨在以最少的冲突进行快速散列(比 CRC、MDx 和 SHAx 快得多)。它非常适合查找重复项,非常适合 HashTable 索引。

In fact it's used by many of the modern databases (Redis, ElastisSearch, Cassandra) to compute all sort of hashes for various purposes. This specific algorithm was the root source of many performance improvements in the current decade.

事实上,许多现代数据库(Redis、ElastisSearch、Cassandra)都使用它来计算各种用途的哈希值。这种特定的算法是近十年来许多性能改进的根源。

It's also used in implementations of Bloom Filters. You should be aware that if you're searching for "fast hashes", you're probably facing a typical problem that is solved by Bloom filters. ;-)

它也用于Bloom Filters 的实现。您应该知道,如果您正在搜索“快速散列”,您可能会面临一个由布隆过滤器解决的典型问题。;-)

Note: murmur is a general purpose hash, meaning NON cryptographic. It doesn't prevent to find the source "text" that generated a hash. It's NOT appropriate to hash passwords.

注意:murmur 是一个通用的哈希,意思是非加密。它不会阻止找到生成哈希的源“文本”。散列密码是不合适的。

Some more details: MurmurHash - what is it?

更多细节:MurmurHash - 它是什么?

回答by Thomas Pornin

Instead of assuming that MD5 is "fairly slow", try it. A simple C-based implementation of MD5 on a simple PC (mine, a 2.4 GHz Core2, using a single core) can hash 6 millionsof small messages per second. A small message is here anything up to 55 bytes. For longer messages, MD5 hashing speed is linear with the message size, i.e. it crunches data at about 400 megabytes per second. You may note that this is four times the maximum speed of a good harddisk or a gigabit ethernet network card.

与其假设 MD5 “相当慢”,不如尝试一下。在简单的 PC(我的,2.4 GHz Core2,使用单核)上基于 C 的简单 MD5 实现可以每秒散列 600条小消息。一条小消息在这里是最多 55 个字节的任何内容。对于较长的消息,MD5 散列速度与消息大小呈线性关系,即它以每秒 400 兆字节的速度处理数据。您可能会注意到,这是一个好的硬盘或千兆以太网网卡的最大速度的四倍。

Since my PC has four cores, this means that hashing data as fast as my harddisk can provide or receive uses at most 6% of the available computing power. It takes a very special situation for hashing speed to become a bottleneck or even to induce a noticeable cost on a PC.

由于我的 PC 有四个内核,这意味着以我的硬盘可以提供或接收的速度散列数据最多只能使用 6% 的可用计算能力。在非常特殊的情况下,散列速度成为瓶颈,甚至导致 PC 上的显着成本。

On much smaller architectures where hashing speed maybecome somewhat relevant, you may want to use MD4. MD4 is fine for non-cryptographic purposes (and for cryptographic purposes, you should not be using MD5 anyway). It has been reported that MD4 is even faster than CRC32 on ARM-based platforms.

在散列速度可能变得有些相关的更小的架构上,您可能想要使用 MD4。MD4 适用于非加密目的(对于加密目的,您无论如何都不应该使用 MD5)。据报道,在基于 ARM 的平台上,MD4 甚至比 CRC32 还要快。

回答by Anachronist

I suggest urlencode() or base64_encode() for these reasons:

出于以下原因,我建议使用 urlencode() 或 base64_encode():

  • You don't need cryptography
  • You want speed
  • You want a way to identify unique strings while cleaning up 'malformed' strings
  • 你不需要密码学
  • 你想要速度
  • 您需要一种在清理“格式错误”字符串的同时识别唯一字符串的方法

Adapting the benchmark code elsewhere in these replies, I've demonstrated that either of these are way faster than any hash algorithm. Depending on your application, you might be able to use urlencode() or base64_encode() to clean up any 'malformed' strings you want to store.

修改这些回复中其他地方的基准代码,我已经证明其中任何一个都比任何哈希算法都快。根据您的应用程序,您可能能够使用 urlencode() 或 base64_encode() 来清理您想要存储的任何“格式错误”的字符串。

回答by Scott Arciszewski

Step One: Install libsodium(or make sure you're using PHP 7.2+)

第一步:安装 libsodium(或确保您使用的是 PHP 7.2+)

Step Two: Use one of the following:

第二步:使用以下方法之一:

  1. sodium_crypto_generichash(), which is BLAKE2b, a hash function more secure than MD5 but faster than SHA256. (Link has benchmarks, etc.)
  2. sodium_crypto_shorthash(), which is SipHash-2-4, which is appropriate for hash tables but should not be relied on for collision resistance.
  1. sodium_crypto_generichash(),即BLAKE2b,一种比 MD5 更安全但比 SHA256 更快的哈希函数。(链接有基准等)
  2. sodium_crypto_shorthash(),即SipHash-2-4,适用于哈希表,但不应依赖于抗碰撞性。

_shorthashis about 3x as fast as _generichash, but you need a key and you have a small-but-realistic risk of collisions. With _generichash, you probably don't need to worry about collisions, and don't need to use a key (but may want to anyway).

_shorthash的速度大约是 的 3 倍_generichash,但您需要一把钥匙,并且碰撞的风险很小但很现实。使用_generichash,您可能不需要担心冲突,也不需要使用密钥(但无论如何可能想要)。

回答by rogerdpack

If you're looking for fast and unique, I recommend xxHash or something that uses newer cpu's crc32c built-in command, see https://stackoverflow.com/a/11422479/32453. It also links there to possibly even faster hashes if you don't care about the possibility of collision as much.

如果您正在寻找快速和独特的,我推荐 xxHash 或使用较新的 cpu 的 crc32c 内置命令的东西,请参阅 https://stackoverflow.com/a/11422479/32453。如果您不太在意发生冲突的可能性,它还会将那里链接到可能更快的哈希值。