当 git 说它正在“解决增量”时,它实际上在做什么?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/4689844/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
What is git actually doing when it says it is "resolving deltas"?
提问by Nik Reiman
During the first clone of a repository, git first receives the objects (which is obvious enough), and then spends about the same amount of time "resolving deltas". What's actually happening during this phase of the clone?
在存储库的第一次克隆期间,git 首先接收对象(这很明显),然后花费大约相同的时间“解析增量”。在克隆的这个阶段实际发生了什么?
采纳答案by Amber
Git uses delta encodingto store some of the objects in packfiles. However, you don't want to have to play back every single change everon a given file in order to get the current version, so Git also has occasional snapshots of the file contents stored as well. "Resolving deltas" is the step that deals with making sure all of that stays consistent.
Git 使用delta 编码将一些对象存储在包文件中。但是,你不希望有播放的每一个修改过,以获得最新的版本在给定的文件,这样的Git还具有存储和文件内容偶尔的快照。“解决增量”是确保所有这些保持一致的步骤。
Here's a chapterfrom the "Git Internals" section of the Pro Git book, which is available online, that talks about this.
回答by araqnid
The stages of git clone
are:
的阶段git clone
是:
- Receive a "pack" file of all the objects in the repo database
- Create an index file for the received pack
- Check out the head revision (for a non-bare repo, obviously)
- 接收 repo 数据库中所有对象的“打包”文件
- 为收到的包创建一个索引文件
- 检查头部修订(对于非裸回购,显然)
"Resolving deltas" is the message shown for the second stage, indexing the pack file ("git index-pack").
“解析增量”是第二阶段显示的消息,索引包文件(“git index-pack”)。
Pack files do nothave the actual object IDs in them, only the object content. So to determine what the object IDs are, git has to do a decompress+SHA1 of each object in the pack to produce the object ID, which is then written into the index file.
包文件中没有实际的对象 ID,只有对象内容。所以要确定对象ID是什么,git必须对包中的每个对象进行解压+SHA1以生成对象ID,然后将其写入索引文件。
An object in a pack file may be stored as a delta i.e. a sequence of changes to make to some other object. In this case, git needs to retrieve the base object, apply the commands and SHA1 the result. The base object itself might have to be derived by applying a sequence of delta commands. (Even though in the case of a clone, the base object will have been encountered already, there is a limit to how many manufactured objects are cached in memory).
包文件中的对象可以存储为增量,即对其他对象进行的一系列更改。在这种情况下,git 需要检索基础对象,应用命令并对结果进行 SHA1。可能必须通过应用一系列增量命令来派生基础对象本身。(即使在克隆的情况下,基础对象已经遇到过,但内存中缓存的制造对象数量是有限的)。
In summary, the "resolving deltas" stage involves decompressing and checksumming the entire repo database, which not surprisingly takes quite a long time. Presumably decompressing and calculating SHA1s actually takes more time than applying the delta commands.
总之,“解析增量”阶段涉及对整个 repo 数据库进行解压缩和校验和,这并不奇怪,这需要相当长的时间。据推测,解压缩和计算 SHA1 实际上比应用 delta 命令花费更多的时间。
In the case of a subsequent fetch, the received pack file may contain references (as delta object bases) to other objects that the receiving git is expected to already have. In this case, the receiving git actually rewrites the received pack file to include any such referenced objects, so that any storedpack file is self-sufficient. This might be where the message "resolving deltas" originated.
在后续获取的情况下,接收到的包文件可能包含对接收 git 预计已经拥有的其他对象的引用(作为增量对象库)。在这种情况下,接收 git 实际上会重写接收到的包文件以包含任何此类引用的对象,因此任何存储的包文件都是自给自足的。这可能是“解析增量”消息的来源。
回答by Johan
Amber seems to be describing the object model that Mercurial or similar uses. Git does not store the deltas between subsequent versions of an object, but rather full snapshots of the object, every time. It then compresses these snapshots using delta compression, trying to find good deltas to use, regardless of where in the history these exist.
Amber 似乎在描述 Mercurial 或类似产品使用的对象模型。Git 不存储对象的后续版本之间的增量,而是每次存储对象的完整快照。然后它使用增量压缩来压缩这些快照,试图找到要使用的好的增量,而不管它们存在于历史的何处。