什么是 Git 提交 ID?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/29106996/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
What is a Git commit ID?
提问by Ankur Loriya
How are the Git commit IDs generated to uniquely identify the commits?
如何生成 Git 提交 ID 以唯一标识提交?
Example: 521747298a3790fde1710f3aa2d03b55020575aa
例子: 521747298a3790fde1710f3aa2d03b55020575aa
How does it work? Are they only unique for each project? Or for the Git repositories globally?
它是如何工作的?它们是否仅对每个项目都是唯一的?还是针对全球的 Git 存储库?
回答by Schwern
A Git commit ID is a SHA-1 hashof every important thing about the commit. I'm not going to list them all, but here's the important ones...
Git 提交 ID 是关于提交的每个重要事项的SHA-1 哈希。我不打算列出所有这些,但这里是重要的……
- The content, all of it, not just the diff.
- Commit date.
- Committer's name and email address.
- Log message.
- The ID of the previous commit(s).
- 内容,所有这一切,而不仅仅是差异。
- 提交日期。
- 提交者的姓名和电子邮件地址。
- 日志消息。
- 先前提交的 ID。
Change any of that and the commit ID changes. And yes, the same commit with the same properties will have the same ID on a different machine. This serves three purposes. First, it means the system can tell if a commit has been tampered with. It's baked right into the architecture.
更改其中任何一项,提交 ID 就会更改。是的,具有相同属性的相同提交在不同的机器上将具有相同的 ID。这有三个目的。首先,这意味着系统可以判断提交是否被篡改。它直接融入到架构中。
Second, one can rapidly compare commits just by looking at their IDs. This makes Git's network protocols very efficient. Want to compare two commits to see if they're the same? Don't have to send the whole diff, just send the IDs.
其次,只需查看提交的 ID,就可以快速比较提交。这使得 Git 的网络协议非常高效。想要比较两个提交以查看它们是否相同?不必发送整个差异,只需发送 ID。
Third, and this is the genius, two commits with the same IDs have the same history. That's why the ID of the previous commits are part of the hash. If the content of a commit is the same but the parents are different, the commit ID must be different. That means when comparing repositories (like in a push or pull) once Git finds a commit in common between the two repositories it can stop checking. This makes pushing and pulling extremely efficient. For example...
第三,这就是天才,具有相同 ID 的两个提交具有相同的历史记录。这就是为什么之前提交的 ID 是哈希的一部分。如果提交的内容相同但父项不同,则提交 ID 必须不同。这意味着在比较存储库时(例如在推送或拉取中),一旦 Git 发现两个存储库之间有共同的提交,它就可以停止检查。这使得推和拉非常有效。例如...
origin
A - B - C - D - E [master]
A - B [origin/master]
The network conversation for git fetch origin
goes something like this...
网络对话git fetch origin
是这样的......
local
Hey origin, what branches do you have?origin
I have master at E.local
I don't have E, I have your master at B.origin
B you say? I have B and it's an ancestor of E. That checks out. Let me send you C, D and E.
local
嘿起源,你有什么分支?origin
我有 E 硕士。local
我没有E,我有你在B的主人。origin
乙你说?我有 B,它是 E 的祖先。让我寄给你 C、D 和 E。
This is also why when you rewrite a commit with rebase, everything after it has to change. Here's an example.
这也是为什么当您使用 rebase 重写提交时,它之后的所有内容都必须更改。这是一个例子。
A - B - C - D - E - F - G [master]
Let's say you rewrite D, just to change the log message a bit. Now D can no longer be D, it has to be copied to a new commit we'll call D1.
假设您重写 D,只是为了稍微更改日志消息。现在 D 不能再是 D,它必须被复制到我们称为 D1 的新提交中。
A - B - C - D - E - F - G [master]
\
D1
While D1 can have C as its parent (C is unaffected, commits do not know their children) it is disconnected from E, F and G. If we change E's parent to D1, E can't be E anymore. It has to be copied to a new commit E1.
虽然 D1 可以将 C 作为其父级(C 不受影响,提交不知道它们的子级),但它与 E、F 和 G 断开连接。如果我们将 E 的父级更改为 D1,E 就不能再是 E。它必须被复制到一个新的提交 E1。
A - B - C - D - E - F - G [master]
\
D1 - E1
And so on with F to F1 and G to G1.
依此类推,F 到 F1,G 到 G1。
A - B - C - D - E - F - G
\
D1 - E1 - F1 - G1 [master]
They all have the same code, just different parents (or in D1's case, a different commit message).
它们都有相同的代码,只是不同的父级(或者在 D1 的情况下,不同的提交消息)。
回答by Justin Howard
You can see exactly what goes into making a commit id by running
您可以通过运行来确切地看到制作提交 ID 的内容
git cat-file commit HEAD
It will give you something like
它会给你类似的东西
tree 07e239f2f3d8adc12566eaf66e0ad670f36202b5
parent 543a4849f7201da7bed297b279b7b1e9a086a255
author Justin Howard <[email protected]> 1426631449 -0700
committer Justin Howard <[email protected]> 1426631471 -0700
My commit message
It gives you:
它给你:
- A checksum of the tree contents
- The parent commit id (if this is a merge, there will be more parents)
- The author of the commit with timestamp
- The committer of the commit with timestamp
- The commit message
- 树内容的校验和
- 父提交 ID(如果这是合并,则会有更多父提交)
- 带有时间戳的提交的作者
- 带有时间戳的提交的提交者
- 提交消息
Git takes all this and does a sha1 hash of it. You can reproduce the commit id by running
Git 接受所有这些并对其进行 sha1 哈希。您可以通过运行来重现提交 ID
(printf "commit %s##代码##" $(git cat-file commit HEAD | wc -c); git cat-file commit HEAD) | sha1sum
This starts out by printing the string commit
followed by a space and the byte count of the cat-file
text blob. It then adds the cat-file
blob to that followed by a null byte. All of that then gets run through sha1sum
.
首先打印字符串,commit
后跟一个空格和cat-file
文本 blob的字节数。然后将cat-file
blob添加到后跟一个空字节。然后所有这些都得到处理sha1sum
。
As you can see, there is nothing that identifies the project or repository in this information. The reason that this doesn't cause problems is because it is astronomically unlikely for two different commit hashes to collide.
如您所见,此信息中没有任何标识项目或存储库的内容。这不会导致问题的原因是,两个不同的提交哈希发生冲突在天文上是不可能的。