为什么 Git 使用加密哈希函数?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/28792784/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Why does Git use a cryptographic hash function?
提问by Praxeolitic
Why does Git use SHA-1, a cryptographic hash function, instead of a faster non-cryptographic hash function?
为什么 Git 使用SHA-1加密哈希函数,而不是更快的非加密哈希函数?
Related question:
相关问题:
Stack Overflow question Why does Git use SHA-1 as version numbers?asks why Git uses SHA-1 as opposed to sequential numbers for commits.
Stack Overflow 问题为什么 Git 使用 SHA-1 作为版本号?询问为什么 Git 使用 SHA-1 而不是提交的序列号。
回答by VonC
TLDR;
TLDR;
- from 2005 up to 2018/Git 2.18: SHA-1(see below)
- 2019, will switch at some pointto SHA-256
- 从 2005 年到 2018 年/Git 2.18:SHA-1(见下文)
- 2019 年,将在某个时候切换到SHA-256
You can check that from Linus Torvalds himself, when he presented Git to Google back in 2007:
(emphasis mine)
你可以从Linus Torvalds 本人那里查看,当他在 2007 年向 Google 介绍 Git 时:(
强调我的)
We check checksums that is considered cryptographically secure. Nobody has been able to break SHA-1, but the point is, SHA-1 as far as git is concerned, isn't even a security feature. It's purely a consistency check.
The security parts are elsewhere. A lot of people assume since git uses SHA-1 and SHA-1 is used for cryptographically secure stuff, they think that it's a huge security feature. It has nothing at all to do with security, it's just the best hash you can get.Having a good hash is good for being able to trust your data, it happens to have some other good features, too, it means when we hash objects, we know the hash is well distributed and we do not have to worry about certain distribution issues.
Internally it means from the implementation standpoint, we can trust that the hash is so good that we can use hashing algorithms and know there are no bad cases.
So there are some reasons to like the cryptographic side too, but it's really about the ability to trust your data.
I guarantee you, if you put your data in git, you can trust the fact that five years later, after it is converted from your harddisc to DVD to whatever new technology and you copied it along, five years later you can verify the data you get back out is the exact same data you put in. And that is something you really should look for in a source code management system.
我们检查被认为是加密安全的校验和。没有人能够破解 SHA-1,但关键是,就 git 而言,SHA-1 甚至不是安全功能。这纯粹是一致性检查。
安全部分在别处。很多人认为由于 git 使用 SHA-1 而 SHA-1 用于加密安全的东西,他们认为这是一个巨大的安全功能。它与安全性完全无关,它只是您可以获得的最佳哈希值。拥有良好的哈希值有助于信任您的数据,它碰巧也具有其他一些优点,这意味着当我们对对象进行哈希处理时,我们知道哈希值分布良好,我们不必担心某些分布问题.
在内部,这意味着从实现的角度来看,我们可以相信散列非常好,以至于我们可以使用散列算法并且知道没有坏的情况。
因此,也有一些理由喜欢加密方面,但这实际上是关于信任您的数据的能力。
我向你保证,如果你把你的数据放在 git 中,你可以相信这样一个事实,五年后,当它从你的硬盘转换成 DVD 到任何新技术并且你复制它之后,五年后你可以验证你的数据返回的数据与您输入的数据完全相同。这是您真正应该在源代码管理系统中寻找的东西。
Update Dec. 2017 with Git 2.16 (Q1 2018): this effort to support an alternative SHA is underway: see "Why doesn't Git use more modern SHA?".
2017 年 12 月更新 Git 2.16(2018 年第一季度):支持替代 SHA 的努力正在进行中:请参阅“为什么 Git 不使用更现代的 SHA?”。
I mentioned in "How would git handle a SHA-1 collision on a blob?" that you couldengineer a commit with a particular SHA1 prefix(still an extremely costly endeavor).
But the point remains, as Eric Sink mentions in "Git: Cryptographic Hashes" (Version Control by Example (2011) book:
我在“ git 如何处理 blob 上的 SHA-1 冲突?”中提到,您可以设计具有特定 SHA1前缀的提交(仍然是一项非常昂贵的工作)。
但重点仍然存在,正如Eric Sink在“ Git: Cryptographic Hashes”(Version Control by Example (2011) 一书中提到的那样:
It is rather important that the DVCS never encounter two different pieces of data which have the same digest. Fortunately, good cryptographic hash functions are designed to make such collisions extremely unlikely.
DVCS 绝不会遇到具有相同摘要的两个不同数据片段,这一点非常重要。幸运的是,好的加密散列函数旨在使这种冲突极不可能发生。
It is harder to find good non-cryptographic hashwith low collision rate, unless you consider research like "Finding State-of-the-Art Non-cryptographic Hashes with Genetic Programming".
除非您考虑像“使用遗传编程找到最先进的非加密哈希”之类的研究,否则很难找到具有低冲突率的良好非加密哈希。
You can also read "Consider use of non-cryptographic hash algorithm for hashing speed-up", which mentions for instance "xxhash", an extremely fast non-cryptographic Hash algorithm, working at speeds close to RAM limits.
您还可以阅读“考虑使用非加密哈希算法进行哈希加速”,其中提到了例如“ xxhash”,这是一种极快的非加密哈希算法,其工作速度接近 RAM 限制。
Discussions around changing the hash in Git are not new:
关于在 Git 中更改哈希值的讨论并不新鲜:
- either to optimize it (August 2009), but you have to take license issue:
- 要么对其进行优化(2009 年 8 月),但您必须处理许可证问题:
(Linus Torvalds)
(莱纳斯·托瓦兹)
There's not really anything remainingof the mozilla code, but hey, I started from it. In retrospect I probably should have started from the PPC asm code that already did the blocking sanely - but that's a "20/20 hindsight" kind of thing.
Plus hey, the mozilla code being a horrid pile of crud was why I was so convinced that I could improve on things. So that's a kind of source for it, even if it's more about the motivational side than any actual remaining code ;)
Mozilla 代码真的没有任何剩余,但是嘿,我是从它开始的。回想起来,我可能应该从已经理智地完成阻塞的 PPC asm 代码开始——但这是一种“20/20 后见之明”的事情。
另外,嘿,mozilla 代码是一堆可怕的垃圾,这就是为什么我如此确信我可以改进的原因。所以这是它的一种来源,即使它更多的是关于动机方面而不是任何实际剩余的代码;)
And you need to be careful about how to measure the actual optimization gain
并且您需要注意如何衡量实际的优化增益
(Linus Torvalds)
(莱纳斯·托瓦兹)
I pretty much can guarantee you that it improves things only because it makes gcc generate crap code, which then hides some of the P4 issues.
我几乎可以向您保证,它之所以有所改进,只是因为它使 gcc 生成垃圾代码,然后隐藏了一些 P4 问题。
- or to change it altogether (January 2010)
(for instance to SHA-3, but that would apply to any other hash):
- 或者完全改变它(2010 年 1 月)
(例如 SHA-3,但这适用于任何其他哈希):
(John Tapsell - johnflux
)
(约翰·塔普塞尔 - johnflux
)
The engineering cost for upgrading git from SHA-1 to a new algorithm is much higher. I'm not sure how it can be done well.
First of all we probably need to deploy a version of git (let's call it version 2 for this conversation) which allows there to be a slot for a new hash value even though it doesn't read or use that space -- it just uses the SHA-1 hash value which is in the other slot.
That way once we eventuallydeploy yet a newer version of git, let's call it version 3, which produces SHA-3 hashes in addition to SHA-1 hashes, people using git version 2 will be able to continue to inter-operate.
(Although, per this discussion, they may be vulnerable and people who rely on their SHA-1-only patches may be vulnerable.)
将 git 从 SHA-1 升级到新算法的工程成本要高得多。我不确定如何才能做得好。
首先,我们可能需要部署一个 git 版本(在本次对话中我们称之为版本 2),它允许有一个用于新哈希值的插槽,即使它不读取或使用该空间——它只是使用另一个槽中的 SHA-1 哈希值。
这样,一旦我们最终部署了更新版本的 git,我们称之为版本 3,除了 SHA-1 散列之外,它还生成 SHA-3 散列,使用 git 版本 2 的人将能够继续进行互操作。
(尽管,根据本次讨论,它们可能容易受到攻击,而依赖于仅 SHA-1 补丁的人可能会受到攻击。)
In short, switching to anyhash is not easy.
简而言之,切换到任何哈希都不容易。
Update February 2017: yes, it is in theory possible to compute a colliding SHA1: shattered.io
2017 年 2 月更新:是的,理论上可以计算碰撞的 SHA1:shattered.io
How is GIT affected?
GIT strongly relies on SHA-1 for the identification and integrity checking of all file objects and commits.
It is essentially possible to create two GIT repositories with the same head commit hash and different contents, say a benign source code and a backdoored one.
An attacker could potentially selectively serve either repository to targeted users. This will require attackers to compute their own collision.
GIT 是如何受到影响的?
GIT 强烈依赖 SHA-1 来对所有文件对象和提交进行标识和完整性检查。
基本上可以创建两个具有相同头部提交哈希和不同内容的 GIT 存储库,比如良性源代码和后门代码。
攻击者可能会选择性地向目标用户提供任一存储库。这将要求攻击者计算他们自己的碰撞。
But:
但:
This attack required over 9,223,372,036,854,775,808 SHA1 computations. This took the equivalent processing power as 6,500 years of single-CPU computations and 110 years of single-GPU computations.
此攻击需要超过 9,223,372,036,854,775,808 次 SHA1 计算。这相当于 6,500 年的单 CPU 计算和 110 年的单 GPU 计算的处理能力。
So let's not panic just yet.
See more at "How would Git handle a SHA-1 collision on a blob?".
所以让我们暂时不要恐慌。
请参阅“ Git 如何处理 blob 上的 SHA-1 冲突?”。