git 在提交对象中存储差异信息吗?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/10398744/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-19 06:51:31  来源:igfitidea点击:

does git store diff information in commit objects?

gitdiff

提问by Alexander Bird

According to this:

根据这个

It is important to note that this is very different from most SCM systems that you may be familiar with. Subversion, CVS, Perforce, Mercurial and the like all use Delta Storage systems - they store the differences between one commit and the next. Git does not do this - it stores a snapshot of what all the files in your project look like in this tree structure each time you commit. This is a very important concept to understand when using Git.

需要注意的是,这与您可能熟悉的大多数 SCM 系统非常不同。Subversion、CVS、Perforce、Mercurial 等都使用增量存储系统——它们存储一次提交和下一次提交之间的差异。Git 不会这样做 - 它在每次提交时存储项目中所有文件在此树结构中的外观的快照。这是在使用 Git 时需要理解的一个非常重要的概念。

Yet when I run git show $SHA1ofCommitObject...

然而当我跑git show $SHA1ofCommitObject...

commit 4405aa474fff8247607d0bf599e054173da84113
Author: Joe Smoe <[email protected]>
Date:   Tue May 1 08:48:21 2012 -0500

    First commit

diff --git a/index.html b/index.html
new file mode 100644
index 0000000..de8b69b
--- /dev/null
+++ b/index.html
@@ -0,0 +1 @@
+<h1>Hello World!</h1>
diff --git a/interests/chess.html b/interests/chess.html
new file mode 100644
index 0000000..e5be7dd
--- /dev/null
+++ b/interests/chess.html
@@ -0,0 +1 @@
+Did you see on Slashdot that King's Gambit accepted is solved! <a href="http://game

... it outputs the diff of the commit with the previous commits. I know that git doesn't store diffs in blob objects, but does it store diffs in commit objects? Or is git showdynamically calculating the diff?

...它输出提交与先前提交的差异。我知道 git 不在 blob 对象中存储差异,但它是否在提交对象中存储差异?还是git show动态计算差异?

回答by Carl

What the statement means is that, most other version control systems need a point of reference in the past to be able to re-create the current commit.

该声明的意思是,大多数其他版本控制系统都需要过去的参考点才能重新创建当前提交。

For example, at some point in the past, a diff-based VCS (version control system) would have stored a full snapshot:

例如,在过去的某个时刻,基于差异的 VCS(版本控制系统)会存储完整的快照:

x = snapshot
+ = diff
History:
x-----+-----+-----+-----(+) Where we are now

So, in such a scenario, to re-create the state at (now), it would have to checkout (x) and then apply diffs for each (+) until it gets to now. Note that it would extremely inefficient to store deltas forever, so every so often, delta based VCSes store a full snapshot. Here's how its done for subversion.

因此,在这种情况下,要重新创建(现在)的状态,它必须检出 (x),然后对每个 (+) 应用差异,直到现在为止。请注意,永远存储增量会非常低效,因此,基于增量的 VCS 每隔一段时间都会存储完整的快照。下面是它是如何完成 subversion 的

Now, git is different. Git stores references to complete blobs and this means that with git, only one commit is sufficient to recreate the codebase at that point in time. Git does not need to look up information from past revisions to create a snapshot.

现在,git 不同了。Git 存储对完整 blob 的引用,这意味着使用 git,只需一次提交就足以在该时间点重新创建代码库。Git 不需要从过去的修订中查找信息来创建快照。

So if that is the case, then where does the delta compression that git uses come in?

那么如果是这种情况,那么 git 使用的增量压缩从何而来?

Well, it is nothing but a compression concept - there is no point storing the same information twice, if only a tiny amount has changed. Therefore, represent what has changed, but store a reference to it, so that the commit that it belongs to, which is in effect a tree of references, can still be re-created without looking at past commits. The thing is, though, that Git does not do this immediately after every commit, but rather on a garbage collection run. So, if git has not run its garbage collection, you can see objects in your index with very similar content.

嗯,这只不过是一个压缩概念——如果只有一点点变化,那么将相同的信息存储两次是没有意义的。因此,表示已更改的内容,但存储对它的引用,以便它所属的提交(实际上是引用树)仍然可以在不查看过去提交的情况下重新创建。不过,问题是 Git 不会在每次提交后立即执行此操作,而是在垃圾收集运行时执行此操作。因此,如果 git 尚未运行其垃圾收集,您可以在索引中看到内容非常相似的对象。

However, when Git runs its garbage collection (or when you call git gcmanually), then the duplicates are cleaned up and a read only pack file is created. You don't have to worry about running garbage collection manually - git contains heuristics which tell it when to do so.

但是,当 Git 运行其垃圾收集时(或当您git gc手动调用时),则会清除重复项并创建只读包文件。您不必担心手动运行垃圾收集 - git 包含启发式方法,可以告诉它何时这样做。

回答by Mark Longair

No, commit objects in git don't contain diffs - instead, each commit object contains a hash of the tree, which recursively and completely defines the content of the source tree at that commit. There's a nice explanation in the git community bookof what goes into blob objects, tree objects and commit objects .

不,git 中的提交对象不包含差异——相反,每个提交对象都包含树的一个哈希值,它递归地完全定义该提交时源树的内容。在 git community book中对 blob objects、tree objects 和 commit objects 的内容有一个很好的解释

All the diffs that are shown to you by git's tools are calculated on demand from the complete content of files.

git 工具向您显示的所有差异都是根据文件的完整内容按需计算的。