Git 如何创建唯一的提交哈希,主要是前几个字符?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/34764195/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How does Git create unique commit hashes, mainly the first few characters?
提问by Ben
I find it hard to wrap my head around how Git creates fully unique hashes that aren't allowed to be the same even in the first 4 characters. I'm able to call commits in Git Bash using only the first four characters. Is it specifically decided in the algorithm that the first characters are "ultra"-unique and will not ever conflict with other similar hashes, or does the algorithm generate every part of the hash in the same way?
我发现很难理解 Git 如何创建完全独特的哈希,即使在前 4 个字符中也不允许相同。我只能使用前四个字符在 Git Bash 中调用提交。是在算法中明确决定第一个字符是“超”唯一的并且永远不会与其他类似的散列冲突,还是算法以相同的方式生成散列的每个部分?
回答by Chris Maes
Git uses the following information to generate the sha-1:
Git 使用以下信息生成 sha-1:
- The source tree of the commit (which unravels to all the subtrees and blobs)
- The parent commit sha1
- The author info (with timestamp)
- The committer info (right, those are different!, also with timestamp)
- The commit message
- 提交的源树(分解为所有子树和 blob)
- 父提交 sha1
- 作者信息(带时间戳)
- 提交者信息(对,那些是不同的!,还有时间戳)
- 提交消息
(on the complete explanation; look here).
(关于完整的解释;看这里)。
Git does NOTguarantee that the first 4 characters will be unique. In chapter 7 of the Pro Git Bookit is written:
Git不保证前 4 个字符是唯一的。在Pro Git Book 的第 7 章中写道:
Git can figure out a short, unique abbreviation for your SHA-1 values. If you pass --abbrev-commit to the git log command, the output will use shorter values but keep them unique; it defaults to using seven characters but makes them longer if necessary to keep the SHA-1 unambiguous:
Git 可以为您的 SHA-1 值找出一个简短而独特的缩写。如果您将 --abbrev-commit 传递给 git log 命令,输出将使用较短的值但保持它们的唯一性;它默认使用七个字符,但如果有必要使它们更长以保持 SHA-1 明确:
So Git just makes the abbreviation as long as necessaryto remain unique. They even note that:
因此,Git 只会在必要时使用缩写以保持唯一性。他们甚至注意到:
Generally, eight to ten characters are more than enough to be unique within a project.
As an example, the Linux kernel, which is a pretty large project with over 450k commits and 3.6 million objects, has no two objects whose SHA-1s overlap more than the first 11 characters.
一般来说,八到十个字符足以在一个项目中成为唯一的。
例如,Linux 内核是一个非常大的项目,拥有超过 45 万次提交和 360 万个对象,没有两个对象的 SHA-1 重叠超过前 11 个字符。
So in fact they just depend on the great improbabilityof having the exact same (X first characters of a) sha.
因此,实际上它们仅取决于具有完全相同(a 的第 X 个字符)sha 的可能性很小。
回答by VonC
Apr. 2017: Beware that after the all shattered.io episode(where a SHA1 collision was achieved by Google), the 20-byte format won't be there forever.
2017 年 4 月:请注意,在 all shattered.io 事件(Google 实现了 SHA1 冲突)之后,20 字节格式将不会永远存在。
A first step for that is to replace unsigned char sha1[20]which is hard-code all over the Git codebase by a generic object whose definition might change in the future (SHA2?, Blake2, ...)
第一步是unsigned char sha1[20]用一个通用对象替换整个 Git 代码库中的硬编码,该对象的定义将来可能会改变(SHA2?,Blake2,...)
See commit e86ab2c(21 Feb 2017) by brian m. carlson (bk2204).
请参阅brian m 的commit e86ab2c(2017 年 2 月 21 日)。卡尔森 ( bk2204)。
Convert the remaining uses of
unsigned char [20]tostruct object_id.
将 的其余用途转换
unsigned char [20]为struct object_id。
That is an example of an ongoing effort started with commit 5f7817c(13 Mar 2015) by brian m. carlson (bk2204), for v2.5.0-rc0, in cache.h:
这是一个持续努力的例子,从Brian m 的commit 5f7817c(2015 年 3 月 13 日)开始。carlson ( bk2204),对于 v2.5.0-rc0,在cache.h:
/* The length in bytes and in hex digits of an object name (SHA-1 value). */
#define GIT_SHA1_RAWSZ 20
#define GIT_SHA1_HEXSZ (2 * GIT_SHA1_RAWSZ)
struct object_id {
unsigned char hash[GIT_SHA1_RAWSZ];
};
And don't forget that, even with SHA1, the 4 first characters are no longer enough to guarantee uniqueness, as I explain in "How much of a git sha is generallyconsidered necessary to uniquely identify a change in a given codebase?".
并且不要忘记,即使使用 SHA1,前 4 个字符也不足以保证唯一性,正如我在“通常认为需要多少 git sha来唯一标识给定代码库中的更改?”中所述。
Update Dec. 2017with Git 2.16 (Q1 2018): this effort to support an alternative SHA is underway: see "Why doesn't Git use more modern SHA?".
2017 年 12 月更新Git 2.16(2018 年第一季度):支持替代 SHA 的努力正在进行中:请参阅“为什么 Git 不使用更现代的 SHA?”。
You will be able to use another hash: SHA1 is no longer the only one for Git.
您将能够使用另一个哈希:SHA1 不再是 Git 的唯一哈希。
Update 2018-2019: the choice has been made in Git 2.19+: SHA-256.
See "hash-function-transition".
2018-2019 更新:已在 Git 2.19+ 中做出选择:SHA-256。
参见“散列函数转换”。
This is not yet active (meaning git 2.21 is still using SHA1), but the code is being done to support in the future SHA-256.
这尚未激活(意味着 git 2.21 仍在使用 SHA1),但正在编写代码以支持未来的 SHA-256。
With Git 2.26 (Q1 2020), the work goes on, and uses "struct object_id"for replacing use of "char *sha1"
使用Git 2.26(Q1 2020),工作在继续,并用“结构object_id"替换使用” char *sha1“
See commit 2fecc48, commit 6ac9760, commit b99b6bc, commit 63f4a7f, commit e31c710, commit 500e4f2, commit f66d4e0, commit a93c141, commit 3f83fd5, commit 0763671(24 Feb 2020) by Jeff King (peff).
(Merged by Junio C Hamano -- gitster--in commit e8e7184, 05 Mar 2020)
见提交2fecc48,提交6ac9760,提交b99b6bc,提交63f4a7f,提交e31c710,提交500e4f2,提交f66d4e0,提交a93c141,提交3f83fd5,提交0763671(2020年2月24日),由杰夫·王(peff)。
(由Junio C gitsterHamano合并-- --在提交 e8e7184 中,2020 年 3 月 5 日)
packfile: dropnth_packed_object_sha1()Signed-off-by: Jeff King
Once upon a time,
nth_packed_object_sha1()was the primary way to get the oid of a packfile's index position.
But these days we have the more type-safenth_packed_object_id()wrapper, and all callers have been converted.Let's drop the "
sha1" version (turning the safer wrapper into a single function) so that nobody is tempted to introduce new callers.
packfile: 降低nth_packed_object_sha1()签字人:杰夫·金
曾几何时,这
nth_packed_object_sha1()是获取包文件索引位置的 oid 的主要方法。
但是现在我们有了更加类型安全的nth_packed_object_id()包装器,并且所有的调用者都被转换了。让我们放弃 "
sha1" 版本(将更安全的包装器变成单个函数),以便没有人想要引入新的调用者。

