Git(Hub) 如何处理来自短 SHA 的可能冲突?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/7128444/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How does Git(Hub) handle possible collisions from short SHAs?
提问by Aseem Kishore
Both Git and GitHub display short versions of SHAs -- just the first 7 characters instead of all 40 -- and both Git and GitHub support taking these short SHAs as arguments.
Git 和 GitHub 都显示 SHA 的简短版本——仅显示前 7 个字符而不是全部 40 个字符——并且 Git 和 GitHub 都支持将这些简短的 SHA 作为参数。
E.g. git show 962a9e8
例如 git show 962a9e8
E.g. https://github.com/joyent/node/commit/962a9e8
例如https://github.com/joyent/node/commit/962a9e8
Given that the possibility space is now orders of magnitude lower, "just" 268 million, how do Git and GitHub protect against collisions here? And how do they handle them?
考虑到可能性空间现在低了几个数量级,“只有” 2.68 亿,Git 和 GitHub 在这里如何防止冲突?他们是如何处理的?
回答by emboss
These short forms are just to simplify visual recognition and to make your life easier. Git doesn't really truncate anything, internally everything will be handled with the complete value. You can use a partial SHA-1 at your convenience, though:
这些简短的表格只是为了简化视觉识别并使您的生活更轻松。Git 并没有真正截断任何内容,内部所有内容都将使用完整值进行处理。不过,您可以在方便时使用部分 SHA-1:
Git is smart enough to figure out what commit you meant to type if you provide the first few characters, as long as your partial SHA-1 is at least four characters long and unambiguous — that is, only one object in the current repository begins with that partial SHA-1.
如果您提供前几个字符,只要您的部分 SHA-1 长度至少为四个字符且没有歧义——也就是说,当前存储库中只有一个对象以那个部分 SHA-1。
回答by Keith Thompson
I have a repository that has a commit with an id of 000182eacf99cde27d5916aa415921924b82972c
.
我有一个存储库,其提交的 ID 为000182eacf99cde27d5916aa415921924b82972c
.
git show 00018
shows the revision, but
显示修订,但
git show 0001
prints
印刷
error: short SHA1 0001 is ambiguous.
error: short SHA1 0001 is ambiguous.
fatal: ambiguous argument '0001': unknown revision or path not in the working tree.
Use '--' to separate paths from revisions
(If you're curious, it's a clone of the git repository for git itself; that commit is one that Linus Torvalds made in 2005.)
(如果您好奇,它是 git 本身的 git 存储库的克隆;该提交是 Linus Torvalds 在 2005 年所做的。)
回答by VonC
Two notes here:
这里有两个注意事项:
If you type yanywhere on the GitHub page displaying a commit, you will see the full 40 bytes of said commit.
That illustrates emboss's point: GitHub doesn't truncate anything.And 7 hex digits (28 bits) isn't enough since 2010 anyway.
See commit dce9648by Linus Torwalds himself (Oct 2010, git 1.7.4.4):
如果您y在显示提交的 GitHub 页面上的任何位置键入,您将看到所述提交的完整 40 个字节。
这说明了emboss的观点:GitHub 不会截断任何内容。无论如何,自 2010 年以来,7 个十六进制数字(28 位)是不够的。
请参阅Linus Torwalds 本人的commit dce9648(2010 年 10 月,git 1.7.4.4):
The default of 7 comes from fairly early in git development, when seven hex digits was a lot (it covers about 250+ million hash values). Back then I thought that 65k revisions was a lot (it was what we were about to hit in BK), and each revision tends to be about 5-10 new objects or so, so a million objects was a big number.
默认值 7 来自 git 开发的早期,当时 7 个十六进制数字很多(它涵盖了大约 250+ 百万个哈希值)。那时我认为 65k 修订是很多(这是我们将要在 BK 中达到的),并且每个修订往往是大约 5-10 个新对象左右,所以一百万个对象是一个很大的数字。
(BK = BitKeeper)
(BK = BitKeeper)
These days, the kernel isn't even the largest git project, and even the kernel has about 220k revisions (muchbigger than the BK tree ever was) and we are approaching two million objects. At that point, seven hex digits is still unique for a lot of them, but when we're talking about just two orders of magnitude difference between number of objects and the hash size, there willbe collisions in truncated hash values. It's no longer even close to unrealistic - it happens all the time.
We should both increase the default abbrev that was unrealistically small, andadd a way for people to set their own default per-project in the git config file.
这些天来,内核甚至不是最大的Git项目,甚至内核约220K版本(多比BK树曾是大),我们正在接近200万级的对象。在这一点上,七个十六进制数字对很多人来说仍然是唯一的,但是当我们谈论对象数量和散列大小之间只有两个数量级的差异时,截断的散列值会发生冲突。它甚至不再接近于不切实际——它一直在发生。
我们应该增加不切实际的小默认缩写,并添加一种方法让人们在 git 配置文件中设置他们自己的默认每个项目。