为什么 Git 不使用更现代的 SHA?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/28159071/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-09 02:51:07  来源:igfitidea点击:

Why doesn't Git use more modern SHA?

gitcryptographysha

提问by qazwsx

I read about that Git uses SHA-1 digest as an ID for a revision. Why does it not use a more modern version of SHA?

我读到 Git 使用 SHA-1 摘要作为修订版的 ID。为什么它不使用更现代的 SHA 版本?

采纳答案by VonC

Why does it not use a more modern version of SHA?

为什么它不使用更现代的 SHA 版本?

Dec. 2017: It will. And Git 2.16 (Q1 2018) is the first release to illustrate and implement that intent.

2017 年 12 月:会的。Git 2.16(2018 年第一季度)是第一个说明和实现该意图的版本。

Note: see Git 2.19 below: it will be SHA-256.

注意:请参阅下面的 Git 2.19:它将是SHA-256

Git 2.16 will propose an infrastructure to define what hash function is used in Git, and will start an effort to plumb that throughout various codepaths.

Git 2.16 将提出一个基础设施来定义 Git 中使用的哈希函数,并将开始努力在各种代码路径中探索它。

See commit c250e02(28 Nov 2017) by Ramsay Jones (``).
See commit eb0ccfd, commit 78a6766, commit f50e766, commit abade65(12 Nov 2017) by brian m. carlson (bk2204).
(Merged by Junio C Hamano -- gitster--in commit 721cc43, 13 Dec 2017)

请参阅Ramsay Jones (``) 的commit c250e02(28 Nov 2017 )
请参阅brian m 的commit eb0ccfdcommit 78a6766commit f50e766commit abade65(2017 年 11 月 12 日)。卡尔森 ( bk2204)
(由Junio C gitsterHamano合并-- --提交 721cc43 中,2017 年 12 月 13 日)



Add structure representing hash algorithm

Since in the future we want to support an additional hash algorithm, add a structure that represents a hash algorithm and all the data that must go along with it.
Add a constant to allow easy enumeration of hash algorithms.
Implement function typedefsto create an abstract API that can be used by any hash algorithm, and wrappers for the existing SHA1 functions that conform to this API.

Expose a value for hex size as well as binary size.
While one will always be twice the other, the two values are both used extremely commonly throughout the codebase and providing both leads to improved readability.

Don't include an entry in the hash algorithm structure for the null object ID.
As this value is all zeros, any suitably sized all-zero object ID can be used, and there's no need to store a given one on a per-hash basis.

The current hash function transition plan envisions a time when we will accept input from the user that might be in SHA-1 or in the NewHash format.
Since we cannot know which the user has provided, add a constant representing the unknown algorithmto allow us to indicate that we must look the correct value up.

添加表示哈希算法的结构

由于将来我们希望支持额外的散列算法,因此添加一个表示散列算法的结构以及必须与它一起使用的所有数据
添加一个常量以允许轻松枚举散列算法
实现functiontypedefs以创建可由任何哈希算法使用的抽象 API,以及符合此 API 的现有 SHA1 函数的包装器。

公开hex size 和 binary size 的值
虽然一个永远是另一个的两倍,但这两个值在整个代码库中都非常普遍地使用,并且提供这两个值可以提高可读性。

不要在哈希算法结构中包含空对象 ID 的条目。
由于此值全为零,因此可以使用任何大小合适的全零对象 ID,并且无需在每个散列的基础上存储给定的对象 ID。

当前的哈希函数转换计划设想了一个时间,我们将接受来自用户的输入,这些输入可能是 SHA-1 或 NewHash 格式。
由于我们无法知道用户提供的是哪个,因此添加一个代表未知算法常量,以指示我们必须查找正确的值。



Integrate hash algorithm support with repo setup

In future versions of Git, we plan to support an additional hash algorithm.
Integrate the enumeration of hash algorithms with repository setup, and store a pointer to the enumerated data in struct repository.
Of course, we currently only support SHA-1, so hard-code this value in read_repository_format.
In the future, we'll enumerate this value from the configuration.

Add a constant, the_hash_algo, which points to the hash_algostructure pointer in the repository global.
Note that this is the hash which is used to serialize data to disk, not the hash which is used to display items to the user.
The transition plan anticipates that these may be different.
We can add an additional element in the future (say, ui_hash_algo) to provide for this case.

将哈希算法支持与 repo 设置集成

在 Git 的未来版本中,我们计划支持额外的哈希算法。
将哈希算法的枚举与存储库设置集成,并将指向枚举数据的指针存储在 struct repository 中
当然,我们目前只支持 SHA-1,所以在 read_repository_format.
将来,我们将从配置中枚举此值。

添加一个常量,the_hash_algo,它指向hash_algo存储库全局中的结构指针。
请注意,这是用于将数据序列化到磁盘的散列,而不是用于向用户显示项目的散列。
过渡计划预计这些可能会有所不同。
我们可以在未来添加一个额外的元素(比如,ui_hash_algo)来提供这种情况。



Update August 2018, for Git 2.19 (Q3 2018), Git seems to pick SHA-256as NewHash.

2018 年 8 月更新,对于 Git 2.19(2018 年第三季度),Git 似乎选择SHA-256作为 NewHash。

See commit 0ed8d8d(04 Aug 2018) by Jonathan Nieder (artagnon).
See commit 13f5e09(25 Jul 2018) by ?var Arnfj?re Bjarmason (avar).
(Merged by Junio C Hamano -- gitster--in commit 34f2297, 20 Aug 2018)

请参阅Jonathan Nieder ( ) 的commit 0ed8d8d(04 Aug 2018 )。 请参阅?var Arnfj?re Bjarmason ( ) 的commit 13f5e09(2018 年 7 月 25 日(由Junio C Hamano合并-- --commit 34f2297,2018 年 8 月 20 日)artagnon
avar
gitster

doc hash-function-transition: pick SHA-256 as NewHash

From a security perspective, it seems that SHA-256, BLAKE2, SHA3-256, K12, and so on are all believed to have similar security properties.
All are good options from a security point of view.

SHA-256 has a number of advantages:

  • It has been around for a while, is widely used, and is supported by just about every single crypto library (OpenSSL, mbedTLS, CryptoNG, SecureTransport, etc).

  • When you compare against SHA1DC, most vectorized SHA-256 implementations are indeed faster, even without acceleration.

  • If we're doing signatures with OpenPGP (or even, I suppose, CMS), we're going to be using SHA-2, so it doesn't make sense to have our security depend on two separate algorithms when either one of them alone could break the security when we could just depend on one.

So SHA-256 it is.
Update the hash-function-transition design doc to say so.

After this patch, there are no remaining instances of the string "NewHash", except for an unrelated use from 2008 as a variable name in t/t9700/test.pl.

文档hash-function-transition:选择 SHA-256 作为 NewHash

从安全角度来看,SHA-256、BLAKE2、SHA3-256、K12等似乎都被认为具有相似的安全属性。
从安全的角度来看,所有这些都是不错的选择。

SHA-256 有许多优点:

  • 它已经存在了一段时间,被广泛使用,并且几乎每个加密库(OpenSSL、mbedTLS、CryptoNG、SecureTransport 等)都支持它。

  • 当您与 SHA1DC 进行比较时,即使没有加速,大多数矢量化 SHA-256 实现确实更快。

  • 如果我们使用 OpenPGP(或者甚至,我想是 CMS)进行签名,我们将使用 SHA-2,因此我们的安全性依赖于两种不同的算法是没有意义的当我们只能依赖一个时,单独可能会破坏安全性。

所以 SHA-256 是
更新哈希函数转换设计文档以说明这一点。

这个补丁后,有串“的没有剩余的情况下,NewHash”除了2008年,从一个不相关的作为,在变量名 t/t9700/test.pl



You can see this transition to SHA 256 in progress with Git 2.20 (Q4 2018):

您可以看到 Git 2.20(2018 年第四季度)正在向 SHA 256 过渡:

See commit 0d7c419, commit dda6346, commit eccb5a5, commit 93eb00f, commit d8a3a69, commit fbd0e37, commit f690b6b, commit 49d1660, commit 268babd, commit fa13080, commit 7b5e614, commit 58ce21b, commit 2f0c9e9, commit 825544a(15 Oct 2018) by brian m. carlson (bk2204).
See commit 6afedba(15 Oct 2018) by SZEDER Gábor (szeder).
(Merged by Junio C Hamano -- gitster--in commit d829d49, 30 Oct 2018)

提交0d7c419提交dda6346提交eccb5a5提交93eb00f提交d8a3a69提交fbd0e37提交f690b6b提交49d1660提交268babd提交fa13080提交7b5e614提交58ce21b提交2f0c9e9提交825544a(2018年10月15日)由布赖恩米. 卡尔森 ( bk2204)
请参阅SZEDER Gábor ( ) 的commit 6afedba(2018 年 10 月 15 日)(合并于szeder
Junio C gitsterHamano -- --d829d49 提交中,2018 年 10 月 30 日)

replace hard-coded constants

Replace several 40-based constants with references to GIT_MAX_HEXSZor the_hash_algo, as appropriate.
Convert all uses of the GIT_SHA1_HEXSZto use the_hash_algoso that they are appropriate for any given hash length.
Instead of using a hard-coded constant for the size of a hex object ID, switch to use the computed pointer from parse_oid_hexthat points after the parsed object ID.

替换硬编码常量

根据需要,将几个基于 40 的常量替换为对GIT_MAX_HEXSZ或 的 引用the_hash_algo
将 的所有使用转换GIT_SHA1_HEXSZ为使用,the_hash_algo以便它们适用于任何给定的散列长度。
不是使用硬编码常量作为十六进制对象 ID 的大小,而是切换到使用来自parse_oid_hex解析对象 ID 之后的点的计算指针。

GIT_SHA1_HEXSZis further remove/replaced with Git 2.22 (Q2 2019) and commit d4e568b.

GIT_SHA1_HEXSZ进一步删除/替换为 Git 2.22(2019 年第二季度)并提交 d4e568b



That transition continues with Git 2.21 (Q1 2019), which adds sha-256 hash and plug it through the code to allow building Git with the "NewHash".

这种转变在 Git 2.21(2019 年第一季度)中继续进行,它添加了 sha-256 哈希并将其插入代码以允许使用“NewHash”构建 Git。

See commit 4b4e291, commit 27dc04c, commit 13eeedb, commit c166599, commit 37649b7, commit a2ce0a7, commit 50c817e, commit 9a3a0ff, commit 0dab712, commit 47edb64(14 Nov 2018), and commit 2f90b9d, commit 1ccf07c(22 Oct 2018) by brian m. carlson (bk2204).
(Merged by Junio C Hamano -- gitster--in commit 33e4ae9, 29 Jan 2019)

提交4b4e291提交27dc04c提交13eeedb提交c166599提交37649b7提交a2ce0a7提交50c817e提交9a3a0ff提交0dab712提交47edb64(2018年11月14日),以及提交2f90b9d提交1ccf07c(2018年10月22日)由布赖恩米. 卡尔森 ( bk2204)
(由Junio C gitsterHamano合并-- --commit 33e4ae9,2019 年 1 月 29 日)

Add a base implementation of SHA-256 support (Feb. 2019)

SHA-1 is weak and we need to transition to a new hash function.
For some time, we have referred to this new function as NewHash.
Recently, we decided to pick SHA-256 as NewHash.
The reasons behind the choice of SHA-256 are outlined in this threadand in the commit history for the hash function transition document.

Add a basic implementation of SHA-256 based off libtomcrypt, which is in the public domain.
Optimize it and restructure it to meet our coding standards.
Pull in the update and final functions from the SHA-1 block implementation, as we know these function correctly with all compilers. This implementation is slower than SHA-1, but more performant implementations will be introduced in future commits.

Wire up SHA-256 in the list of hash algorithms, and add a test that the algorithm works correctly.

Note that with this patch, it is still not possible to switch to using SHA-256 in Git.
Additional patches are needed to prepare the code to handle a larger hash algorithm and further test fixes are needed.

hash: add an SHA-256 implementation using OpenSSL

We already have OpenSSL routines available for SHA-1, so add routines for SHA-256 as well.

On a Core i7-6600U, this SHA-256 implementation compares favorably to the SHA1DC SHA-1 implementation:

SHA-1: 157 MiB/s (64 byte chunks); 337 MiB/s (16 KiB chunks)
SHA-256: 165 MiB/s (64 byte chunks); 408 MiB/s (16 KiB chunks)

sha256: add an SHA-256 implementation using libgcrypt

Generally, one gets better performance out of cryptographic routines written in assembly than C, and this is also true for SHA-256.
In addition, most Linux distributions cannot distribute Git linked against OpenSSL for licensing reasons.

Most systems with GnuPG will also have libgcrypt, since it is a dependency of GnuPG.
libgcryptis also faster than the SHA1DC implementation for messages of a few KiB and larger.

For comparison, on a Core i7-6600U, this implementation processes 16 KiB chunks at 355 MiB/s while SHA1DC processes equivalent chunks at 337 MiB/s.

In addition, libgcrypt is licensed under the LGPL 2.1, which is compatible with the GPL. Add an implementation of SHA-256 that uses libgcrypt.

添加 SHA-256 支持的基本实现(2019 年 2 月)

SHA-1 很弱,我们需要过渡到新的哈希函数。
一段时间以来,我们将这个新函数称为NewHash
最近,我们决定选择 SHA-256 作为NewHash.
此线程和哈希函数转换文档的提交历史中概述了选择 SHA-256 背后的原因

添加一个基于 SHA-256 的基本实现 off libtomcrypt,这是在公共领域。
对其进行优化和重组,以满足我们的编码标准。
从 SHA-1 块实现中提取更新和最终函数,因为我们对所有编译器都正确了解这些函数。此实现比 SHA-1 慢,但在未来的提交中将引入更高性能的实现。

在哈希算法列表中连接 SHA-256,并添加一个测试该算法是否正常工作。

请注意,使用此补丁,仍然无法在 Git 中切换到使用 SHA-256。
需要额外的补丁来准备代码以处理更大的哈希算法,并且需要进一步的测试修复。

hash: 添加一个使用 OpenSSL 的 SHA-256 实现

我们已经有可用于 SHA-1 的 OpenSSL 例程,因此也为 SHA-256 添加例程。

在 Core i7-6600U 上,此 SHA-256 实现优于 SHA1DC SHA-1 实现:

SHA-1: 157 MiB/s (64 byte chunks); 337 MiB/s (16 KiB chunks)
SHA-256: 165 MiB/s (64 byte chunks); 408 MiB/s (16 KiB chunks)

sha256:使用添加 SHA-256 实现 libgcrypt

通常,用汇编语言编写的加密例程比 C 获得更好的性能,对于 SHA-256 也是如此。
此外,出于许可原因,大多数 Linux 发行版无法分发与 OpenSSL 链接的 Git。

大多数带有 GnuPG 的系统也会有libgcrypt,因为它是 GnuPG 的依赖项。
libgcrypt对于几 KiB 和更大的消息,它也比 SHA1DC 实现更快。

相比之下,在 Core i7-6600U 上,此实现以 355 MiB/s 的速度处理 16 KiB 块,而 SHA1DC 以 337 MiB/s 的速度处理等效块。

此外,libgcrypt 在 LGPL 2.1 下获得许可,与 GPL 兼容。添加使用 libgcrypt 的 SHA-256 实现。



The upgrade effort goes on with Git 2.24 (Q4 2019)

Git 2.24(2019 年第四季度)继续进行升级工作

See commit aaa95df, commit be8e172, commit 3f34d70, commit fc06be3, commit 69fa337, commit 3a4d7aa, commit e0cb7cd, commit 8d4d86b, commit f6ca67d, commit dd336a5, commit 894c0f6, commit 4439c7a, commit 95518fa, commit e84f357, commit fe9fec4, commit 976ff7e, commit 703d2d4, commit 9d958cc, commit 7962e04, commit fee4930(18 Aug 2019) by brian m. carlson (bk2204).
(Merged by Junio C Hamano -- gitster--in commit 676278f, 11 Oct 2019)

提交aaa95df提交be8e172提交3f34d70提交fc06be3提交69fa337提交3a4d7aa提交e0cb7cd提交8d4d86b提交f6ca67d提交dd336a5提交894c0f6提交4439c7a提交95518fa提交e84f357提交fe9fec4提交976ff7e提交703d2d4提交9d958cc提交7962e04提交fee4930(2019 年 8 月 18 日)作者:brian m。卡尔森 ( bk2204)
(由Junio C gitsterHamano合并-- --提交 676278f 中,2019 年 10 月 11 日)

Instead of using GIT_SHA1_HEXSZand hard-coded constants, switch to using the_hash_algo.

而不是使用GIT_SHA1_HEXSZ和硬编码常量,切换到使用the_hash_algo



With Git 2.26 (Q1 2020), the test scriptsare ready for the day when the object names will use SHA-256.

在 Git 2.26(2020 年第一季度)中,测试脚本已准备好迎接对象名称将使用 SHA-256 的那一天。

See commit 277eb5a, commit 44b6c05, commit 7a868c5, commit 1b8f39f, commit a8c17e3, commit 8320722, commit 74ad99b, commit ba1be1a, commit cba472d, commit 82d5aeb, commit 3c5e65c, commit 235d3cd, commit 1d86c8f, commit 525a7f1, commit 7a1bcb2, commit cb78f4f, commit 717c939, commit 08a9dd8, commit 215b60b, commit 194264c(21 Dec 2019) by brian m. carlson (bk2204).
(Merged by Junio C Hamano -- gitster--in commit f52ab33, 05 Feb 2020)

提交277eb5a提交44b6c05提交7a868c5提交1b8f39f提交a8c17e3提交8320722提交74ad99b提交ba1be1a提交cba472d提交82d5aeb提交3c5e65c提交235d3cd提交1d86c8f提交525a7f1提交7a1bcb2提交cb78f4f提交717c939提交08a9dd8提交215b60b提交194264c(2019 年 12 月 21 日)作者:brian m。卡尔森 ( bk2204)
(由Junio C gitsterHamano合并-- --提交 f52ab33 中,2020 年 2 月 5 日)

Example:

例子:

t4204: make hash size independent

Signed-off-by: brian m. carlson

Use $OID_REGEXinstead of a hard-coded regular expression.

t4204: 使散列大小独立

签字人:brian m. 卡尔森

使用$OID_REGEX而不是硬编码的正则表达式。

So, instead of using:

所以,而不是使用:

grep "^[a-f0-9]\{40\} $(git rev-parse HEAD)$" output

Tests are using

测试正在使用

grep "^$OID_REGEX $(git rev-parse HEAD)$" output

And OID_REGEXcomes from commit bdee9cd(13 May 2018) by brian m. carlson (bk2204).
(Merged by Junio C Hamano -- gitster--in commit 9472b13, 30 May 2018, Git v2.18.0-rc0)

OID_REGEX来自提交bdee9cd通过(2018年5月13日),布赖恩·米 卡尔森 ( bk2204)
(由Junio C gitsterHamano合并-- --in commit 9472b13,2018 年 5 月 30 日,Git v2.18.0-rc0)

t/test-lib: introduce OID_REGEX

Signed-off-by: brian m. carlson

Currently we have a variable, $_x40,which contains a regex that matches a full 40-character hex constant.

However, with NewHash, we'll have object IDs that are longer than 40 characters.

In such a case, $_x40will be a confusing name.

Create a $OID_REGEXvariable which will always reflect a regex matching the appropriate object ID, regardless of the length of the current hash.

t/test-lib: 介绍 OID_REGEX

签字人:brian m. 卡尔森

目前我们有一个变量,$_x40,它包含一个匹配完整 40 个字符的十六进制常量的正则表达式。

但是,使用NewHash,我们将拥有超过 40 个字符的对象 ID。

在这种情况下,$_x40将是一个令人困惑的名称。

创建一个$OID_REGEX变量,该变量将始终反映匹配适当对象 ID 的正则表达式,而不管当前散列的长度如何。

And, still for tests:

而且,仍然用于测试:

See commit f303765, commit edf0424, commit 5db24dc, commit d341e08, commit 88ed241, commit 48c10cc, commit f7ae8e6, commit e70649b, commit a30f93b, commit a79eec2, commit 796d138, commit 417e45e, commit dfa5f53, commit f743e8f, commit 72f936b, commit 5df0f11, commit 07877f3, commit 6025e89, commit 7b1a182, commit 94db7e3, commit db12505(07 Feb 2020) by brian m. carlson (bk2204).
(Merged by Junio C Hamano -- gitster--in commit 5af345a, 17 Feb 2020)

提交f303765提交edf0424提交5db24dc提交d341e08提交88ed241提交48c10cc提交f7ae8e6提交e70649b提交a30f93b提交a79eec2提交796d138提交417e45e提交dfa5f53提交f743e8f提交72f936b提交5df0f11提交07877f3提交6025e89提交7b1a182提交94db7e3brian mbk2204提交 db12505(2020 年 2 月 7 日)卡尔森 ( )
(由Junio C gitsterHamano合并-- --commit 5af345a,2020 年 2 月 17 日)

t5703: make test work with SHA-256

Signed-off-by: brian m. carlson

This test used an object ID which was 40 hex characters in length, causing the test not only not to pass, but to hang, when run with SHA-256 as the hash.

Change this value to a fixed dummy object ID using test_oid_initand test_oid.

Furthermore, ensure we extract an object ID of the appropriate length using cut with fields instead of a fixed length.

t5703:使用 SHA-256 进行测试

签字人:brian m. 卡尔森

该测试使用了一个长度为 40 个十六进制字符的对象 ID,当使用 SHA-256 作为哈希运行时,导致测试不仅没有通过,而且挂起。

使用test_oid_init和将此值更改为固定的虚拟对象 ID test_oid

此外,确保我们使用带字段的剪切而不是固定长度来提取适当长度的对象 ID。



Some codepaths were given a repository instance as a parameter to work in the repository, but passed the_repositoryinstance to its callees, which has been cleaned up (somewhat) with Git 2.26 (Q1 2020).

一些代码路径被赋予一个存储库实例作为在存储库中工作的参数,但将the_repository实例传递给它的被调用者,该被调用者已使用 Git 2.26(2020 年第一季度)进行了清理(在某种程度上)。

See commit b98d188, commit 2dcde20, commit 7ad5c44, commit c8123e7, commit 5ec9b8a, commit a651946, commit eb999b3(30 Jan 2020) by Matheus Tavares (matheustavares).
(Merged by Junio C Hamano -- gitster--in commit 78e67cd, 14 Feb 2020)

请参阅Matheus Tavares ( ) 的commit b98d188commit 2dcde20commit 7ad5c44commit c8123e7commit 5ec9b8acommit a651946commit eb999b3(2020 年 1 月 30 日(由Junio C Hamano合并-- --commit 78e67cd,2020 年 2 月 14 日)matheustavares
gitster

sha1-file: allow check_object_signature()to handle any repo

Signed-off-by: Matheus Tavares

Some callers of check_object_signature()can work on arbitrary repositories, but the repo does not get passed to this function. Instead, the_repositoryis always used internally.
To fix possible inconsistencies, allow the function to receive a struct repository and make those callers pass on the repo being handled.

sha1-file: 允许check_object_signature()处理任何回购

签字人:马修斯·塔瓦雷斯

的一些调用者check_object_signature()可以在任意存储库上工作,但存储库不会传递给此函数。相反,the_repository始终在内部使用。
要修复可能的不一致,请允许该函数接收结构存储库并使这些调用者传递正在处理的存储库。

Based on:

基于:

sha1-file: pass git_hash_algoto hash_object_file()

Signed-off-by: Matheus Tavares

Allow hash_object_file()to work on arbitrary repos by introducing a git_hash_algoparameter. Change callers which have a struct repository pointer in their scope to pass on the git_hash_algofrom the said repo.
For all other callers, pass on the_hash_algo, which was already being used internally at hash_object_file().
This functionality will be used in the following patch to make check_object_signature()be able to work on arbitrary repos (which, in turn, will be used to fix an inconsistency at object.c:parse_object()).

sha1-file:传递git_hash_algohash_object_file()

签字人:马修斯·塔瓦雷斯

允许hash_object_file()通过引入git_hash_algo参数来处理任意存储库。更改在其范围内具有结构存储库指针的调用者以从所述存储库传递git_hash_algo
对于所有其他调用者,传递the_hash_algo,它已在 内部使用hash_object_file()
此功能将在以下补丁中使用,以使其check_object_signature()能够处理任意存储库(反过来,它将用于修复object.c:parse_object()处的不一致)。

回答by softwariness

UPDATE: The above question and this answer are from 2015. Since then Google have announced the first SHA-1 collision: https://security.googleblog.com/2017/02/announcing-first-sha1-collision.html

更新:上面的问题和这个答案是从 2015 年开始的。从那时起谷歌宣布了第一次 SHA-1 冲突:https: //security.googleblog.com/2017/02/annoucing-first-sha1-collision.html



Obviously I can only speculate from the outside looking in about why Git continues to use SHA-1, but these may be among the reasons:

显然我只能从外部推测为什么 Git 继续使用 SHA-1,但这些可能是其中的原因:

  1. Git was Linus Torvald's creation, and Linus apparently does not want to substitute SHA-1 with another hashing algorithm at this time.
  2. He makes plausible claims that successful SHA-1 collision-based attacks against Git are a good deal harder than achieving the collisions themselves, and considering that SHA-1 is weaker than it should be, not completely broken, that makes it substantially far from a workable attack at least today. Moreover, he notes that a "successful" attack would achieve very little if the colliding object arrives later than the existing one, as the later one would just be assumed to be the same as the valid one and ignored (though others have pointed out that the reverse could occur).
  3. Changing software is time-consuming and error-prone especially when there is existing infrastructure and data based around the existing protocols that will have to be migrated. Even those who produce software and hardware products where cryptographic security is the sole point of the system are still in the process of migrating away from SHA-1 and other weak algorithms in places. Just imagine all those hardcoded unsigned char[20]buffers all over the place ;-), it's a lot easier to program for cryptographic agility at the start, rather than retrofitting it later.
  4. Performance of SHA-1 is better than the various SHA-2 hashes (probably not by so much as to be a deal-breaker now, but maybe was a sticking point 10 years ago), and the storage size of SHA-2 is larger.
  1. Git 是 Linus Torvald 的创造,Linus 目前显然不想用另一种散列算法代替 SHA-1。
  2. 他提出了合理的说法,即成功的针对 Git 的基于 SHA-1 碰撞的攻击比实现碰撞本身要困难得多,并且考虑到 SHA-1 比它应有的弱,并没有完全被破坏,这使得它远非至少在今天可行的攻击。此外,他指出,如果碰撞对象晚于现有对象到达,则“成功”攻击将收效甚微,因为后者只会被假定为与有效对象相同并被忽略(尽管其他人指出可能会发生相反的情况)。
  3. 更改软件既费时又容易出错,尤其是在必须迁移基于现有协议的现有基础设施和数据时。即使是那些生产以加密安全为系统唯一重点的软件和硬件产品的公司,也仍然处于从 SHA-1 和其他弱算法迁移到某些地方的过程中。想象一下所有这些硬编码的unsigned char[20]缓冲区到处都是;-),在开始时为加密敏捷性编程要容易得多,而不是以后对其进行改造。
  4. SHA-1 的性能优于各种 SHA-2 散列(现在可能还没有成为交易破坏者,但可能是 10 年前的症结所在),并且 SHA-2 的存储大小更大.

Some links:

一些链接:

My personal view would be that whilst practical attacks are probably some time off, and even when they do occur people will probably initially mitigate against them with means other than changing the hash algorithm itself, that if you do care about security that you should be erring on the side of caution with your choices of algorithms, and continually revising upwards your security strengths, because the capabilities of attackers are also going only in one direction, so it would be unwise to take Git as a role model, especially as its purpose in using SHA-1 is not purporting to be cryptographic security.

我个人的观点是,虽然实际攻击可能需要一段时间,即使它们确实发生了,人们最初也可能会通过更改哈希算法本身以外的方式来缓解它们,如果您确实关心安全性,那么您应该犯错谨慎选择算法,并不断向上修正您的安全优势,因为攻击者的能力也只向一个方向发展,因此将 Git 作为榜样是不明智的,尤其是作为其目的使用 SHA-1 并不声称是加密安全。

回答by Arne Babenhauserheide

This is a discussion of the urgency of migrating away from SHA1 for Mercurial, but it applies to Git as well: https://www.mercurial-scm.org/wiki/mpm/SHA1

这是关于从 SHA1 迁移到 Mercurial 的紧迫性的讨论,但它也适用于 Git:https: //www.mercurial-scm.org/wiki/mpm/SHA1

In short: If you're not extremely dilligent today, you have much worse vulnerabilities than sha1. But despite that, Mercurial started over 10 years ago to prepare for migrating away from sha1.

简而言之:如果您今天不是非常勤奋,那么您的漏洞要比 sha1 严重得多。尽管如此,Mercurial 在 10 多年前就开始准备从 sha1 迁移。

work has been underway for years to retrofit Mercurial's data structures and protocols for SHA1's successors. Storage space was allocated for larger hashes in our revlog structure over 10 years ago in Mercurial 0.9 with the the introduction of RevlogNG. The bundle2 format introduced more recently supports the exchange of different hash types over the network. The only remaining pieces are choice of a replacement function and choosing a backwards-compatibility strategy.

多年来,为 SHA1 的后继者改造 Mercurial 的数据结构和协议的工作一直在进行中。10 多年前,随着 RevlogNG 的引入,在 Mercurial 0.9 中为我们的 revlog 结构中的较大哈希分配了存储空间。最近引入的 bundle2 格式支持通过网络交换不同的哈希类型。剩下的唯一部分是选择替换功能和选择向后兼容策略。

If git does not migrate away from sha1 before Mercurial does, you could always add another level of security by keeping a local Mercurial mirror with hg-git.

如果 git 没有在 Mercurial 之前从 sha1 迁移,你总是可以通过使用hg-git保留本地 Mercurial 镜像来增加另一个级别的安全性。

回答by Paul Wagland

There is now a transition planto a stronger hash, so it looks like in future it will use a more modern hash than SHA-1. From the current transition plan:

现在有一个向更强散列的过渡计划,因此看起来将来它将使用比 SHA-1 更现代的散列。从目前的过渡计划来看:

Some hashes under consideration are SHA-256, SHA-512/256, SHA-256x16, K12, and BLAKE2bp-256

正在考虑的一些哈希是 SHA-256、SHA-512/256、SHA-256x16、K12 和 BLAKE2bp-256