Git 如何处理 blob 上的 SHA-1 冲突？

Question

提问by Gnurou

This probably never happened in the real-world yet, and may never happen, but let's consider this: say you have a git repository, make a commit, and get very very unlucky: one of the blobs ends up having the same SHA-1 as another that is already in your repository. Question is, how would Git handle this? Simply fail? Find a way to link the two blobs and check which one is needed according to the context?

这在现实世界中可能从未发生过，也可能永远不会发生，但让我们考虑一下：假设您有一个 git 存储库，进行提交，但非常不幸：其中一个 blob 最终具有相同的 SHA-1作为已经在您的存储库中的另一个。问题是，Git 将如何处理这个问题？只是失败？找到一种方法来链接两个 blob 并根据上下文检查需要哪个 blob？

More a brain-teaser than an actual problem, but I found the issue interesting.

与其说是实际问题，不如说是脑筋急转弯，但我发现这个问题很有趣。

Answer 1

回答by Ruben

I did an experiment to find out exactly how Git would behave in this case. This is with version 2.7.9~rc0+next.20151210 (Debian version). I basically just reduced the hash size from 160-bit to 4-bit by applying the following diff and rebuilding git:

我做了一个实验来确切地找出 Git 在这种情况下的行为。这是版本 2.7.9~rc0+next.20151210（Debian 版本）。我基本上只是通过应用以下差异并重建 git 将哈希大小从 160 位减少到 4 位：

--- git-2.7.0~rc0+next.20151210.orig/block-sha1/sha1.c
+++ git-2.7.0~rc0+next.20151210/block-sha1/sha1.c
@@ -246,6 +246,8 @@ void blk_SHA1_Final(unsigned char hashou
    blk_SHA1_Update(ctx, padlen, 8);

    /* Output hash */
-   for (i = 0; i < 5; i++)
-       put_be32(hashout + i * 4, ctx->H[i]);
+   for (i = 0; i < 1; i++)
+       put_be32(hashout + i * 4, (ctx->H[i] & 0xf000000));
+   for (i = 1; i < 5; i++)
+       put_be32(hashout + i * 4, 0);
 }

Then I did a few commits and noticed the following.

然后我做了一些提交并注意到以下内容。

If a blob already exists with the same hash, you will not get any warnings at all. Everything seems to be ok, but when you push, someone clones, or you revert, you will lose the latest version (in line with what is explained above).
If a tree object already exists and you make a blob with the same hash: Everything will seem normal, until you either try to push or someone clones your repository. Then you will see that the repo is corrupt.
If a commit object already exists and you make a blob with the same hash: same as #2 - corrupt
If a blob already exists and you make a commit object with the same hash, it will fail when updating the "ref".
If a blob already exists and you make a tree object with the same hash. It will fail when creating the commit.
If a tree object already exists and you make a commit object with the same hash, it will fail when updating the "ref".
If a tree object already exists and you make a tree object with the same hash, everything will seem ok. But when you commit, all of the repository will reference the wrong tree.
If a commit object already exists and you make a commit object with the same hash, everything will seem ok. But when you commit, the commit will never be created, and the HEAD pointer will be moved to an old commit.
If a commit object already exists and you make a tree object with the same hash, it will fail when creating the commit.

如果已经存在具有相同散列的 blob，则您根本不会收到任何警告。一切似乎都没问题，但是当您推送、有人克隆或还原时，您将丢失最新版本（与上面解释的内容一致）。
如果树对象已经存在，并且您使用相同的哈希创建了一个 blob：一切看起来都很正常，直到您尝试推送或有人克隆您的存储库。然后你会看到 repo 已损坏。
如果提交对象已经存在并且您使用相同的散列创建了一个 blob：与 #2 相同 - 损坏
如果 blob 已经存在并且您使用相同的散列创建提交对象，则更新“ref”时它将失败。
如果 blob 已经存在，并且您使用相同的散列创建了一个树对象。创建提交时它将失败。
如果树对象已经存在并且您使用相同的散列创建提交对象，则更新“ref”时它将失败。
如果一个树对象已经存在并且你用相同的散列创建了一个树对象，那么一切看起来都没有问题。但是当你提交时，所有的存储库都会引用错误的树。
如果提交对象已经存在并且您使用相同的散列创建了一个提交对象，那么一切看起来都没有问题。但是当你提交时，提交将永远不会被创建，并且 HEAD 指针将被移动到一个旧的提交。
如果提交对象已经存在并且您使用相同的哈希创建树对象，则创建提交时它将失败。

For #2 you will typically get an error like this when you run "git push":

对于#2，当你运行“git push”时，你通常会得到这样的错误：

error: object 0400000000000000000000000000000000000000 is a tree, not a blob
fatal: bad blob object
error: failed to push some refs to origin

or:

或者：

error: unable to read sha1 file of file.txt (0400000000000000000000000000000000000000)

if you delete the file and then run "git checkout file.txt".

如果您删除文件然后运行“git checkout file.txt”。

For #4 and #6, you will typically get an error like this:

对于 #4 和 #6，您通常会收到如下错误：

error: Trying to write non-commit object
f000000000000000000000000000000000000000 to branch refs/heads/master
fatal: cannot update HEAD ref

when running "git commit". In this case you can typically just type "git commit" again since this will create a new hash (because of the changed timestamp)

运行“git commit”时。在这种情况下，您通常可以再次键入“git commit”，因为这将创建一个新的哈希（因为更改了时间戳）

For #5 and #9, you will typically get an error like this:

对于 #5 和 #9，您通常会收到如下错误：

fatal: 1000000000000000000000000000000000000000 is not a valid 'tree' object

when running "git commit"

运行“git commit”时

If someone tries to clone your corrupt repository, they will typically see something like:

如果有人试图克隆您损坏的存储库，他们通常会看到如下内容：

git clone (one repo with collided blob,
d000000000000000000000000000000000000000 is commit,
f000000000000000000000000000000000000000 is tree)

Cloning into 'clonedversion'...
done.
error: unable to read sha1 file of s (d000000000000000000000000000000000000000)
error: unable to read sha1 file of tullebukk
(f000000000000000000000000000000000000000)
fatal: unable to checkout working tree
warning: Clone succeeded, but checkout failed.
You can inspect what was checked out with 'git status'
and retry the checkout with 'git checkout -f HEAD'

What "worries" me is that in two cases (2,3) the repository becomes corrupt without any warnings, and in 3 cases (1,7,8), everything seems ok, but the repository content is different than what you expect it to be. People cloning or pulling will have a different content than what you have. The cases 4,5,6 and 9 are ok, since it will stop with an error. I suppose it would be better if it failed with an error at least in all cases.

我“担心”的是，在两种情况下 (2,3) 存储库在没有任何警告的情况下损坏，在 3 种情况下 (1,7,8)，一切看起来都不错，但存储库内容与您期望的不同成为。人们克隆或拉动的内容将与您拥有的内容不同。情况 4、5、6 和 9 没问题，因为它会因错误而停止。我想如果它至少在所有情况下都因错误而失败会更好。

Answer 2

回答by VonC

Original answer (2012) (see shattered.io2017 SHA1 collision below)

原始答案 (2012)（请参阅shattered.io下面的 2017 SHA1 冲突）

That old (2006) answer from Linusmight still be relevant:

这从莱纳斯旧（2006年）的回答仍然可能是相关的：

Nope. If it has the same SHA1, it means that when we receive the object from the other end, we will notoverwrite the object we already have.
So what happens is that if we ever see a collision, the "earlier" object in any particular repository will always end up overriding. But note that "earlier" is obviously per-repository, in the sense that the git object network generates a DAG that is not fully ordered, so while different repositories will agree about what is "earlier" in the case of direct ancestry, if the object came through separate and not directly related branches, two different repos may obviously have gotten the two objects in different order.
However, the "earlier will override" is very much what you want from a security standpoint: remember that the git model is that you should primarily trust only your ownrepository.
So if you do a "git pull", the new incoming objects are by definition less trustworthy than the objects you already have, and as such it would be wrong to allow a new object to replace an old one.
So you have two cases of collision:
the inadvertent kind, where you somehow are very very unlucky, and two files end up having the same SHA1.
At that point, what happens is that when you commit that file (or do a "git-update-index" to move it into the index, but not committed yet), the SHA1 of the new contents will be computed, but since it matches an old object, a new object won't be created, and the commit-or-index ends up pointing to the oldobject.
You won't notice immediately (since the index will match the old object SHA1, and that means that something like "git diff" will use the checked-out copy), but if you ever do a tree-level diff (or you do a clone or pull, or force a checkout) you'll suddenly notice that that file has changed to something completelydifferent than what you expected.
So you would generally notice this kind of collision fairly quickly.
In related news, the question is what to do about the inadvertent collision..
First off, let me remind people that the inadvertent kind of collision is really really reallydamn unlikely, so we'll quite likely never ever see it in the full history of the universe.
But ifit happens, it's not the end of the world: what you'd most likely have to do is just change the file that collided slightly, and just force a new commit with the changed contents(add a comment saying "/* This line added to avoid collision */") and then teach git about the magic SHA1 that has been shown to be dangerous.
So over a couple of million years, maybe we'll have to add one or two "poisoned" SHA1 values to git. It's very unlikely to be a maintenance problem ;)
The attacker kind of collisionbecause somebody broke (or brute-forced) SHA1.
This one is clearly a lotmore likely than the inadvertent kind, but by definition it's always a "remote" repository. If the attacker had access to the local repository, he'd have much easier ways to screw you up.
So in this case, the collision is entirely a non-issue: you'll get a "bad" repository that is different from what the attacker intended, but since you'll never actually use his colliding object, it's literallyno different from the attacker just not having found a collision at all, but just using the object you already had (ie it's 100% equivalent to the "trivial" collision of the identical file generating the same SHA1).

不。如果它具有相同的SHA1，则意味着当我们从另一端接收对象时，我们不会覆盖我们已经拥有的对象。
所以发生的情况是，如果我们看到冲突，任何特定存储库中的“较早”对象将始终最终被覆盖。但请注意，“更早”显然是针对每个存储库的，因为 git 对象网络生成的 DAG 未完全排序，因此虽然不同的存储库会就直接祖先的情况下“更早”的内容达成一致，如果对象来自单独且不直接相关的分支，两个不同的存储库显然可能以不同的顺序获得了两个对象。
然而，从安全的角度来看，“更早的将覆盖”正是您想要的：请记住，git 模型是您应该主要只信任您自己的存储库。
因此，如果您执行“ git pull”，根据定义，新传入的对象的可信度低于您已有的对象，因此允许新对象替换旧对象是错误的。
所以你有两种碰撞情况：
在不经意的那种，你不知何故是非常非常不走运，而两个文件最终具有相同SHA1。
那时，发生的情况是，当您提交该文件（或执行“ git-update-index”将其移动到索引中，但尚未提交）时，将计算新内容的 SHA1，但由于它与旧对象匹配，不会创建新对象，并且提交或索引最终指向旧对象。
你会不会立刻注意到（因为该指数将匹配旧的对象SHA1，这意味着，像“ git diff”将使用检出副本），但如果你要做一棵树级别的差异（或者你做一个克隆或拉，或强制结帐）您会突然注意到该文件已更改为某些内容完全不同于你的预期。
所以你通常会很快注意到这种碰撞。
另据相关消息，问题是如何处理不慎碰撞..
首先，我要提醒人们，那种不经意的碰撞是真的真的真的很该死的可能性不大，因此我们很可能永远不会看到它的全部历史宇宙的。
但是，如果它发生了，这并不是世界末日：您最有可能要做的只是更改轻微冲突的文件，然后强制使用更改后的内容进行新的提交（添加注释说“ /* This line added to avoid collision */”）和然后教 git 关于已被证明是危险的魔法 SHA1。
所以在几百万年之后，也许我们将不得不向 git 添加一两个“中毒”的 SHA1 值。这不太可能是维护问题;)
该攻击者那种碰撞，因为有人打破（或野蛮强制）SHA1。
这一个显然是一个很多比无意样的可能性较大，但顾名思义它总是一个“远程”库。如果攻击者可以访问本地存储库，他将有更简单的方法来搞砸你。
所以在这种情况下，碰撞是完全不成问题：你会得到一个“坏”的库是从什么攻击者预想的不同，但因为你从来没有真正用他的碰撞物体，这是字面上没有什么不同攻击者根本没有发现碰撞，但只是使用您已经拥有的对象（即它 100% 相当于生成相同 SHA1 的相同文件的“琐碎”碰撞）。

The question of using SHA-256is regularly mentioned, but not act upon for now (2012).
Note: starting 2018 and Git 2.19, the code is being refactored to use SHA-256.

使用 SHA-256的问题经常被提及，但暂时不采取行动（2012）。
注意：从 2018 和 Git 2.19 开始，代码正在重构以使用 SHA-256。

Note (Humor): you can force a commit to a particular SHA1 prefix, with the project gitbrutefrom Brad Fitzpatrick (bradfitz).

注（幽默）：你可以强制提交到一个特定的SHA1前缀，项目gitbrute从布拉德·菲茨帕特里克（bradfitz）。

gitbrute brute-forces a pair of author+committer timestamps such that the resulting git commit has your desired prefix.

gitbrute 暴力破解一对作者+提交者时间戳，以便生成的 git commit 具有您想要的前缀。

Example: https://github.com/bradfitz/deadbeef

示例：https: //github.com/bradfitz/deadbeef

Daniel Dinnyespoints out in the commentsto 7.1 Git Tools - Revision Selection, which includes:

丹尼尔Dinnyes指出，在评论中，以7.1的Git工具-修正选择，其中包括：

A higher probability exists that every member of your programming team will be attacked and killed by wolves in unrelated incidents on the same night.

编程团队的每个成员在同一天晚上在不相关的事件中被狼袭击和杀死的可能性更高。

Even the more recently (February 2017) shattered.iodemonstrated the possibility of forging a SHA1 collision:
(see much more in my separate answer, including Linus Torvalds' Google+ post)

甚至最近（2017 年 2 月）shattered.io证明了伪造 SHA1 冲突的可能性：（
在我的单独答案中查看更多信息，包括 Linus Torvalds 的 Google+ 帖子）

a/ still requires over 9,223,372,036,854,775,808 SHA1 computations. This took the equivalent processing power as 6,500 years of single-CPU computations and 110 years of single-GPU computations.
b/ would forge onefile (with the same SHA1), but with the additional constraint its content andsize would produce the identical SHA1 (a collision on the content alone is not enough): see "How is the git hash calculated?"): a blob SHA1 is computed based on the content andsize.

a/ 仍然需要超过 9,223,372,036,854,775,808 次 SHA1 计算。这相当于 6,500 年的单 CPU 计算和 110 年的单 GPU 计算的处理能力。
b/ 将伪造一个文件（具有相同的 SHA1），但在附加约束的情况下，其内容和大小将产生相同的 SHA1（仅内容冲突是不够的）：请参阅“如何计算 git 哈希？”） : blob SHA1 是根据内容和大小计算的。

See "Lifetimes of cryptographic hash functions" from Valerie Anita Aurorafor more.
In that page, she notes:

有关更多信息，请参阅Valerie Anita Aurora 的“加密哈希函数的生命周期” 。在该页面中，她指出：

Google spent 6500 CPU years and 110 GPU years to convince everyone we need to stop using SHA-1 for security critical applications.
Also because it was cool

Google 用了 6500 个 CPU 年和 110 个 GPU 年说服每个人我们需要停止将 SHA-1 用于安全关键应用程序。
也因为它很酷

See more in my separate answer below.

在下面我的单独答案中查看更多信息。

Answer 3

回答by Mat

According to Pro Git:

根据Pro Git 的说法：

If you do happen to commit an object that hashes to the same SHA-1 value as a previous object in your repository, Git will see the previous object already in your Git database and assume it was already written. If you try to check out that object again at some point, you'll always get the data of the first object.

如果您碰巧提交了一个哈希值与存储库中先前对象相同的 SHA-1 值的对象，Git 将在您的 Git 数据库中看到先前的对象并假设它已经被写入。如果您尝试在某个时间再次检出该对象，您将始终获得第一个对象的数据。

So it wouldn't fail, but it wouldn't save your new object either.
I don't know how that would look on the command line, but that would certainly be confusing.

所以它不会失败，但它也不会保存你的新对象。
我不知道在命令行上会是什么样子，但这肯定会令人困惑。

A bit further down, that same reference attempts to illustrate the likely-ness of such a collision:

再往下一点，同样的参考文献试图说明这种碰撞的可能性：

Here's an example to give you an idea of what it would take to get a SHA-1 collision. If all 6.5 billion humans on Earth were programming, and every second, each one was producing code that was the equivalent of the entire Linux kernel history (1 million Git objects) and pushing it into one enormous Git repository, it would take 5 years until that repository contained enough objects to have a 50% probability of a single SHA-1 object collision. A higher probability exists that every member of your programming team will be attacked and killed by wolves in unrelated incidents on the same night.

这是一个示例，可让您了解发生 SHA-1 冲突所需的条件。如果地球上所有 65 亿人都在编程，并且每一秒，每个人都在生成相当于整个 Linux 内核历史（100 万个 Git 对象）的代码并将其推送到一个巨大的 Git 存储库中，那么将需要 5 年的时间该存储库包含足够多的对象，单个 SHA-1 对象碰撞的概率为 50%。编程团队的每个成员在同一天晚上在不相关的事件中被狼袭击和杀死的可能性更高。

Answer 4

回答by VonC

To add to my previous answer from 2012, there is now (Feb. 2017, five years later), an example of actual SHA-1 collision with shattered.io, where you can craft two colliding PDF files: that is obtain a SHA-1 digital signature on the first PDF file which can also be abused as a valid signature on the second PDF file.
See also "At death's door for years, widely used SHA1 function is now dead", and this illustration.

要添加到我从 2012 年开始的先前答案，现在（2017 年 2 月，五年后），有一个实际 SHA-1 与shattered.io碰撞的示例，您可以在其中制作两个碰撞的 PDF 文件：即获得一个 SHA-第一个 PDF 文件上的 1 个数字签名，它也可能被滥用为第二个 PDF 文件上的有效签名。
另请参阅“多年来，广泛使用的 SHA1 函数已死”，以及此插图。

Update 26 of February: Linus confirmed the following points in a Google+ post:

2 月 26 日更新：Linus在 Google+ 帖子中确认了以下几点：

(1) First off - the sky isn't falling. There's a big difference between using a cryptographic hash for things like security signing, and using one for generating a "content identifier" for a content-addressable system like git.
(2) Secondly, the nature of this particular SHA1 attack means that it's actually pretty easy to mitigate against, and there's already been two sets of patches posted for that mitigation.
(3) And finally, there's actually a reasonably straightforward transition to some other hash that won't break the world - or even old git repositories.

(1) 首先 - 天没有塌下来。将加密哈希用于诸如安全签名之类的事情与使用加密哈希为诸如 git 之类的内容可寻址系统生成“内容标识符”之间存在很大差异。
(2) 其次，这种特定 SHA1 攻击的性质意味着它实际上很容易缓解，并且已经发布了两组用于缓解的补丁。
(3) 最后，实际上有一个相当简单的过渡到其他一些不会破坏世界的哈希 - 甚至是旧的 git 存储库。

Regarding that transition, see the Q1 2018 Git 2.16adding a structure representing hash algorithm. The implementation of that transition has started.

关于这种转变，请参阅Q1 2018 Git 2.16添加一个表示哈希算法的结构。该过渡的实施已经开始。

Starting Git 2.19 (Q3 2018), Git has picked SHA-256 as NewHash, and is in the process of integrating it to the code (meaning SHA1 is still the default (Q2 2019, Git 2.21), but SHA2 will be the successor)

从 Git 2.19 (Q3 2018) 开始，Git 选择了SHA-256 作为 NewHash，并且正在将其集成到代码中（意味着 SHA1 仍然是默认值（2019 年第二季度，Git 2.21），但 SHA2 将是后继者）

Original answer (25th of February) But:

原始答案（2 月 25 日）但是：

This allow to forge a blob, however the SHA-1 of the tree would still changes since the size of the forged blob might not be the same as the original one:see "How is the git hash calculated?"; a blob SHA1 is computed based on the content and size.
It does have some issue for git-svnthough. Or rather with svn itself, as seen here.
As I mentioned in my original answer, the cost of such an attempt is still prohibitive for now (6,500 CPU years and 100 GPU years) See also Valerie Anita Aurorain "Lifetimes of cryptographic hash functions".
As commented before, this isn't about securityor trust, but data integrity (de-duplication and error detection) which can be easily detected by a git fsck, as mentioned by Linus Torvaldstoday. git fsckwould warn about a commit message with opaque data hidden after a NUL(although NULisn't always present in a fraudulent file).
Not everybody turns on transfer.fsck, but GitHub does: any push would be will aborted in the case of a malformed object or a broken link. Although... there is a reason this is not activated by default.
a pdf file can have arbitrary binary data that you can change to generate a colliding SHA-1, as opposed as forged source code.
The actual issue in creating two Git repositories with the same head commit hash and different contents. And even then, the attack remains convoluted.
Linus adds:
The whole pointof an SCM is that it isn't about a one-time event, but about continuous history. That also fundamentally means that a successful attack needs to work over time, and not be detectable.
If you can fool a SCM one time, insert your code, and it gets detected next week, you didn't actually do anything useful. You only burned yourself.

这允许伪造一个 blob，但是树的 SHA-1 仍然会改变，因为伪造的 blob 的大小可能与原始 blob 的大小不同：请参阅“ git hash 是如何计算的？”；一个斑点SHA1是基于所述内容和大小来计算。
它确实有一些问题的git-svn，虽然。或者更确切地说是使用 svn 本身，如此处所见。
正如我在最初的回答中提到的那样，这种尝试的成本目前仍然令人望而却步（6,500 个 CPU 年和 100 个 GPU 年）另请参阅“加密哈希函数的生命周期”中的Valerie Anita Aurora。
正如之前所评论的，这与安全或信任无关，而是与数据完整性（重复数据删除和错误检测）有关git fsck，正如今天的 Linus Torvalds所提到的，可以通过轻松检测到。git fsck会警告在 a 之后隐藏不透明数据的提交消息NUL（尽管NUL并不总是存在于欺诈文件中）。
不是每个人都会打开transfer.fsck，但 GitHub 会：任何推送都将在对象格式错误或链接断开的情况下中止。虽然......默认情况下不激活这是有原因的。
pdf 文件可以包含任意二进制数据，您可以更改这些数据以生成冲突的 SHA-1，而不是伪造的源代码。
创建两个具有相同头部提交哈希和不同内容的 Git 存储库的实际问题。即便如此，攻击仍然令人费解。
莱纳斯补充说：
SCM的全部意义在于它不是关于一次性事件，而是关于连续的历史。这也从根本上意味着成功的攻击需要随着时间的推移发挥作用，并且无法被检测到。
如果您可以一次欺骗 SCM，插入您的代码，然后下周它就会被检测到，那么您实际上并没有做任何有用的事情。你只是烧了自己。

Joey Hesstries those pdf in a Git repoand he found:

Joey Hess在Git 存储库中尝试了这些 pdf ，他发现：

That includes two files with the same SHA and size, which do get different blobs thanks to the way git prepends the header to the content.

这包括两个具有相同 SHA 和大小的文件，由于 git 将标题添加到内容的方式，它们确实获得了不同的 blob。

joey@darkstar:~/tmp/supercollider>sha1sum  bad.pdf good.pdf 
d00bbe65d80f6d53d5c15da7c6b4f0a655c5a86a  bad.pdf
d00bbe65d80f6d53d5c15da7c6b4f0a655c5a86a  good.pdf
joey@darkstar:~/tmp/supercollider>git ls-tree HEAD
100644 blob ca44e9913faf08d625346205e228e2265dd12b65    bad.pdf
100644 blob 5f90b67523865ad5b1391cb4a1c010d541c816c1    good.pdf

While appending identical data to these colliding files does generate other collisions, prepending data does not.

虽然将相同的数据附加到这些碰撞文件确实会产生其他碰撞，但预先添加数据不会。

So the main vector of attack (forging a commit) would be:

所以主要的攻击向量（伪造提交）将是：

Generate a regular commit object;
use the entire commit object + NUL as the chosen prefix, and
use the identical-prefix collision attack to generate the colliding good/bad objects.
... and this is useless because the good and bad commit objects still point to the same tree!

生成一个常规的提交对象；
使用整个提交对象 + NUL 作为选择的前缀，并且
使用相同前缀碰撞攻击来生成碰撞的好/坏对象。
...这是无用的，因为好的和坏的提交对象仍然指向同一棵树！

Plus, you already can and detect cryptanalytic collision attacks against SHA-1 present in each file with cr-marcstevens/sha1collisiondetection

另外，您已经可以检测每个文件中存在的针对 SHA-1 的密码分析碰撞攻击 cr-marcstevens/sha1collisiondetection

Adding a similar check in Git itself would have some computation cost.

在 Git 本身中添加类似的检查会产生一些计算成本。

On changing hash, Linux comments:

关于更改哈希，Linux 评论：

The size of the hash and the choice of the hash algorithm are independent issues.
What you'd probably do is switch to a 256-bit hash, use that internally and in the native git database, and then by default only showthe hash as a 40-character hex string (kind of like how we already abbreviate things in many situations).
That way tools around git don't even see the change unless passed in some special "--full-hash" argument (or "--abbrev=64" or whatever - the default being that we abbreviate to 40).

散列的大小和散列算法的选择是独立的问题。
您可能会做的是切换到 256 位哈希，在内部和本机 git 数据库中使用它，然后默认情况下仅将哈希显示为 40 个字符的十六进制字符串（有点像我们已经在许多情况）。
这样 git 周围的工具甚至看不到变化，除非传入一些特殊的“ --full-hash”参数（或“ --abbrev=64”或其他任何东西——默认值是我们缩写为 40）。

Still, a transition plan (from SHA1 to another hash function) would still be complex, but actively studied.
A convert-to-object_idcampaignis in progress:

尽管如此，过渡计划（从 SHA1 到另一个哈希函数）仍然很复杂，但正在积极研究。
一个convert-to-object_id活动是正在进行中：

Update 20th of March: GitHub detail a possible attack and its protection:

3 月 20 日更新：GitHub 详细介绍了可能的攻击及其保护：

SHA-1 names can be assigned trust through various mechanisms. For instance, Git allows you to cryptographically sign a commit or tag. Doing so signs only the commit or tag object itself, which in turn points to other objects containing the actual file data by using their SHA-1 names. A collision in those objects could produce a signature which appears valid, but which points to different data than the signer intended. In such an attack the signer only sees one half of the collision, and the victim sees the other half.

可以通过各种机制为 SHA-1 名称分配信任。例如，Git 允许您对提交或标记进行加密签名。这样做只会对提交或标记对象本身进行签名，而后者又会使用其 SHA-1 名称指向包含实际文件数据的其他对象。这些对象中的冲突可能会产生一个看似有效的签名，但它指向的数据与签名者预期的数据不同。在这种攻击中，签名者只能看到碰撞的一半，而受害者看到的是另一半。

Protection:

保护：

The recent attack uses special techniques to exploit weaknesses in the SHA-1 algorithm that find a collision in much less time. These techniques leave a pattern in the bytes which can be detected when computing the SHA-1 of either half of a colliding pair.
GitHub.com now performs this detection for each SHA-1 it computes, and aborts the operation if there is evidence that the object is half of a colliding pair. That prevents attackers from using GitHub to convince a project to accept the "innocent" half of their collision, as well as preventing them from hosting the malicious half.

最近的攻击使用特殊技术来利用 SHA-1 算法中的弱点，这些弱点可以在更短的时间内找到碰撞。这些技术会在字节中留下一个模式，在计算碰撞对的任一半的 SHA-1 时可以检测到该模式。
GitHub.com 现在对其计算的每个 SHA-1 执行此检测，如果有证据表明对象是碰撞对的一半，则中止操作。这可以防止攻击者使用 GitHub 来说服项目接受他们碰撞的“无辜”部分，并防止他们托管恶意的那部分。

See "sha1collisiondetection" by Marc Stevens

参见马克史蒂文斯的“ sha1collisiondetection”

Again, with Q1 2018 Git 2.16adding a structure representing hash algorithm, the implementation of a transition to a new hash has started.
As mentioned above, the new supported Hash will be SHA-256.

同样，随着2018 年第一季度 Git 2.16添加了一个表示哈希算法的结构，向新哈希的过渡的实现已经开始。
如上所述，新支持的哈希将是SHA-256。

Answer 5

回答by Willem Hengeveld

I think cryptographers would celebrate.

我认为密码学家会庆祝的。

Quote from Wikipedia article on SHA-1:

引自维基百科关于 SHA-1 的文章：

In February 2005, an attack by Xiaoyun Wang, Yiqun Lisa Yin, and Hongbo Yu was announced. The attacks can find collisions in the full version of SHA-1, requiring fewer than 2^69 operations. (A brute-force search would require 2^80 operations.)

2005年2月，王晓云、尹逸群、余洪博发起攻击。攻击可以在完整版的 SHA-1 中发现冲突，需要少于 2^69 次操作。（蛮力搜索需要 2^80 次操作。）

Answer 6

回答by Jeff Burdges

There are several different attack models for hashes like SHA-1, but the one usually discussed is collision search, including Marc Stevens' HashClashtool.

像 SHA-1 这样的哈希有几种不同的攻击模型，但通常讨论的一种是碰撞搜索，包括 Marc Stevens 的HashClash工具。

"As of 2012, the most efficient attack against SHA-1 is considered to be the one by Marc Stevens[34] with an estimated cost of $2.77M to break a single hash value by renting CPU power from cloud servers."

“截至 2012 年，针对 SHA-1 的最有效攻击被 Marc Stevens[34] 认为是通过从云服务器租用 CPU 功率来破解单个哈希值的估计成本为 277 万美元。”

As folks pointed out, you could force a hash collision with git, but doing so won't overwrite the existing objects in another repository. I'd imagine even git push -f --no-thinwon't overwrite the existing objects, but not 100% sure.

正如人们指出的那样，您可以强制与 git 发生哈希冲突，但这样做不会覆盖另一个存储库中的现有对象。我想甚至git push -f --no-thin不会覆盖现有的对象，但不是 100% 肯定。

That said, if you hack into a remote repository then you could make your false object the older one there, possibly embedding hacked code into an open source project on github or similar. If you were careful then maybe you could introduce a hacked version that new users downloaded.

也就是说，如果您入侵远程存储库，那么您可以使您的虚假对象成为那里的旧对象，可能会将被入侵的代码嵌入到 github 或类似项目上的开源项目中。如果您小心，那么也许您可以引入新用户下载的黑客版本。

I suspect however that many things the project's developers might do could either expose or accidentally destroy your multi-million dollar hack. In particular, that's a lot of money down the drain if some developer, who you didn't hack, ever runs the aforementioned git push --no-thinafter modifying the effected files, sometimes even without the --no-thindepending.

然而，我怀疑项目开发人员可能做的许多事情可能会暴露或意外破坏您数百万美元的黑客行为。特别是，如果某些开发人员（您没有破解）git push --no-thin在修改受影响的文件后运行上述内容，有时甚至没有--no-thin依赖，那么这将是一大笔钱。

Git 如何处理 blob 上的 SHA-1 冲突？

提问by Gnurou

回答by Ruben

回答by VonC

回答by Mat

回答by VonC

回答by Willem Hengeveld

回答by Jeff Burdges

相关推荐

最近更新

标签

Git 如何处理 blob 上的 SHA-1 冲突？

提问by Gnurou

回答by Ruben

回答by VonC

回答by Mat

回答by VonC

回答by Willem Hengeveld

回答by Jeff Burdges

相关推荐

Git SVN 错误：一个 Git 进程之前在存储库中崩溃了

git git子模块跟踪最新

如何使用 Eclipse EGit 从 GIT 存储库创建新项目

Git，删除存储库

相关推荐

最近更新

标签