git 树对象的内部格式是什么?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/14790681/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
What is the internal format of a git tree object?
提问by Bystysz
What is the format of a git tree object's content?
git 树对象内容的格式是什么?
The content of a blob object is blob [size of string] NUL [string]
, but what is it for a tree object?
blob 对象的内容是blob [size of string] NUL [string]
,但是树对象的内容是什么?
回答by lemiorhan
The format of a tree object:
树对象的格式:
tree [content size][mode] [file/folder name]tree 192$ echo ciao > file1
$ mkdir folder1
$ echo hello > folder1/file2
$ echo hola > folder1/file3
40000 octopus-admin$ find -type f
./file1
./folder1/file2
./folder1/file3
a84943494657751ce187be401d6bf59ef7a2583c
40000 octopus-deployment$ git init
$ git add .
$ git write-tree
0b6e66b04bc1448ca594f143a91ec458667f420e
14f589a30cf4bd0ce2d7103aa7186abe0167427f
40000 octopus-product$ git ls-tree 0b6e66
100644 blob 887ae9333d92a1d72400c210546e28baa1050e44 file1
040000 tree ab39965d17996be2116fe508faaf9269e903c85b folder1
ec559319a263bc7b476e5f01dd2578f255d734fd
100644 pom.xml$ git cat-file tree 0b6e66
100644 file1 ?z?3=???$ ??Tn(???D40000 folder1 ?9?]?k??o???i???[%
97e5b6b292d248869780d7b0c65834bfb645e32a
40000 srctree [content size].git/objects/0b/6e66b04bc1448ca594f143a91ec458667f420e
6e63db37acba41266493ba8fb68c76f83f1bc9dd
[SHA-1 of referencing blob or tree]
[Entries having references to other trees and blobs]
The format of each entry having references to other trees and blobs:
每个条目的格式都引用了其他树和 blob:
$ openssl zlib -d -in .git/objects/0b/6e66b04bc1448ca594f143a91ec458667f420e
tree 67 100644 file1 ?z?3=???$ ??Tn(???D40000 folder1 ?9?]?k??o???i???[%
I wrote a script deflating tree objects. It outputs as follows:
我写了一个放气树对象的脚本。它输出如下:
$ git ls-tree 0b6e66
100644 blob 887ae9333d92a1d72400c210546e28baa1050e44 file1
040000 tree ab39965d17996be2116fe508faaf9269e903c85b folder1
The number 1 as the first character of a mode shows that is reference to a blob/file. The example above, pom.xml is a blob and the others are trees.
数字 1 作为模式的第一个字符表示对 blob/文件的引用。上面的例子中, pom.xml 是一个 blob,其他的都是树。
Note that I added new lines and spaces after \0
for the sake of pretty printing. Normally all the content has no new lines. Also I converted 20 bytes (i.e. the SHA-1 of referencing blobs and trees) into hex string to visualize better.
请注意,\0
为了漂亮的打印,我在后面添加了新的行和空格。通常所有内容都没有新行。此外,我将 20 个字节(即引用 blob 和树的 SHA-1)转换为十六进制字符串以更好地可视化。
回答by antonio
I try to elaborate a bit more on @lemiorhan answer, by means of a test repo.
我尝试通过测试回购详细说明@lemiorhan 的答案。
Create a test repo
创建一个测试仓库
Create a test project in an empty folder:
在一个空文件夹中创建一个测试项目:
$ echo -e "$(echo ASCIIHASH | sed -e 's/../\x&/g')"
That is:
那是:
$ echo -e "$(echo 887ae9333d92a1d72400c210546e28baa1050e44 | sed -e 's/../\x&/g')"
?z?3=???$ ??Tn(???D
Create the local Git repo:
创建本地 Git 存储库:
$ git ls-tree 0b6e66 | awk -b 'function bsha(asha)\
{patsplit(asha, x, /../); h=""; for(j in x) h=h sprintf("%c", strtonum("0x" x[j])); return(h)}\
{t=t sprintf("%d %s$ git ls-tree 0b6e66 | awk -b 'function bsha(asha)\
{patsplit(asha, x, /../); h=""; for(j in x) h=h sprintf("%s", "\x" x[j]); return(h)}\
{t=t sprintf("%d %stree [content size][mode] [Object name]$ openssl zlib -d -in .git/objects/0b/6e66b04bc1448ca594f143a91ec458667f420e | shasum
0b6e66b04bc1448ca594f143a91ec458667f420e *-
[SHA-1 in binary format]
[Object Entries]
%s", , , bsha())} END {printf("tree %s$ git ls-tree 0b6e66 | awk -b 'function bsha(asha)\
{patsplit(asha, x, /../); h=""; for(j in x) h=h sprintf("%c", strtonum("0x" x[j])); return(h)}\
{t=t sprintf("%d %s$ git ls-tree 0b6e66 | git mktree
0b6e66b04bc1448ca594f143a91ec458667f420e
%s", , , bsha())} END {printf("tree %s0b6e66b04bc1448ca594f143a91ec458667f420e *-
%s", length(t), t)}' | shasum
0b6e66b04bc1448ca594f143a91ec458667f420e *-
%s", length(t), t)}'
tree 187 100644 file1 \x88\x7a\xe9\x33\x3d\x92\xa1\xd7\x24\x00\xc2\x10\x54\x6e\x28\xba\xa1\x05\x0e\x4440000 folder1 \xab\x39\x96\x5d\x17\x99\x6b\xe2\x11\x6f\xe5\x08\xfa\xaf\x92\x69\xe9\x03\xc8\x5b%
%s", , , bsha())} END {printf("tree %s$ mkdir .git/pcache
$ mv .git/objects/pack/*.pack .git/pcache/
$ git unpack-objects < .git/pcache/*.pack
%s", length(t), t)}'
tree 67 100644 file1 ?z?3=???$ ??Tn(???D40000 folder1 ?9?]?k??o???i???[%
The last command returns the hash of the top level tree.
最后一个命令返回顶级树的哈希值。
Read a tree content
读取树内容
To print the content of a tree in human readable format use:
要以人类可读的格式打印树的内容,请使用:
$ git gc
In this case 0b6e66
are the first six characters of the top tree. You can do the same for folder1
.
在这种情况下0b6e66
是顶部树的前六个字符。您可以对folder1
.
To get the same content but in raw format use:
要以原始格式获取相同的内容,请使用:
(?<tree> tree (?&SP) (?&decimal) $ git rev-parse v2.7.2^{tree}
802b6758c0c27ae910f40e1b4862cb72a71eee9f
(?&entry)+ )
(?<entry> (?&octal) (?&SP) (?&strnull) (?&sha1bytes) )
(?<strnull> [^#! /usr/bin/env perl
use strict;
use warnings;
use subs qw/ git_tree_contents_pattern read_raw_tree_object /;
use Compress::Zlib;
my $treeobj = read_raw_tree_object;
my $git_tree_contents = git_tree_contents_pattern;
die "$ diff -u <(cd ~/src/git; git ls-tree 802b6758c0) <(../rawtree)
--- /dev/fd/63 2016-03-09 14:41:37.011791393 -0600
+++ /dev/fd/62 2016-03-09 14:41:37.011791393 -0600
@@ -1,3 +1,4 @@
+tree 15530
100644 blob 5e98806c6cc246acef5f539ae191710a0c06ad3f .gitattributes
100644 blob 1c2f8321386f89ef8c03d11159c97a0f194c4423 .gitignore
100644 blob e5b4126bec557db55924b7b60ed70349626ea2c4 .mailmap
: invalid tree" unless $treeobj =~ /^$git_tree_contents\z/;
die "git cat-file -p 4c975c5f5945564eae86d1e933192c4a9096bfe5
: unexpected header" unless $treeobj =~ s/^(tree [0-9]+)git cat-file tree 4c975c5f5945564eae86d1e933192c4a9096bfe5
//;
print , "\n";
# e.g., 100644 SP .gitattributes [mode] [file/folder name]entries = [
line[0:2]+(line[2].encode('hex'),)
for line in
re.findall('(\d+) (.*?)##代码##(.{20})', body, re.MULTILINE)
]
[SHA-1 of referencing blob or tree]
sha1-bytes
while ($treeobj) {
# /s is important so . matches any byte!
if ($treeobj =~ s/^([0-7]+) (.+?)##代码##(.{20})//s) {
my($mode,$name,$bytes) = (oct(),,);
printf "%06o %s %s\t%s\n",
$mode, ($mode == 040000 ? "tree" : "blob"),
unpack("H*", $bytes), $name;
}
else {
die "##代码##: unexpected tree entry";
}
}
sub git_tree_contents_pattern {
qr/
(?(DEFINE)
(?<tree> tree (?&SP) (?&decimal) ##代码## (?&entry)+ )
(?<entry> (?&octal) (?&SP) (?&strnull) (?&sha1bytes) )
(?<strnull> [^##代码##]+ ##代码##)
(?<sha1bytes> (?s: .{20}))
(?<decimal> [0-9]+)
(?<octal> [0-7]+)
(?<SP> \x20)
)
(?&tree)
/x;
}
sub read_raw_tree_object {
# $ git rev-parse v2.7.2^{tree}
# 802b6758c0c27ae910f40e1b4862cb72a71eee9f
#
# NOTE: extracted using git unpack-objects
my $tree = ".git/objects/80/2b6758c0c27ae910f40e1b4862cb72a71eee9f";
open my $fh, "<", $tree or die "##代码##: open $tree: $!";
binmode $fh or die "##代码##: binmode: $!";
local $/;
my $treeobj = uncompress <$fh>;
die "##代码##: uncompress failed" unless defined $treeobj;
$treeobj
}
]+ ##代码##)
(?<sha1bytes> (?s: .{20}))
(?<decimal> [0-9]+)
(?<octal> [0-7]+)
(?<SP> \x20)
The content is similar to the one physically stored as a file in compressed format, but it misses the initial string:
内容类似于以压缩格式物理存储为文件的内容,但缺少初始字符串:
##代码##To get the actual content, we need to uncompress the file storing the c1f4bf
tree object. The file we want is -- given of the 2/38 path format --:
要获得实际内容,我们需要解压缩存储c1f4bf
树对象的文件。我们想要的文件是——给定的 2/38 路径格式——:
This file is compressed with zlib, therefore we obtain its content with:
该文件是用 zlib 压缩的,因此我们通过以下方式获取其内容:
##代码##We learn the tree content size is 67.
我们了解到树的内容大小是 67。
Note that, since the terminal is not made for printing binaries, it might eat some part of the string or show other weird behaviour. In this case pipe the commands above with | od -c
or use the manual solution in the next section.
请注意,由于终端不是用于打印二进制文件,因此它可能会占用字符串的某些部分或显示其他奇怪的行为。在这种情况下,将上面的命令与| od -c
下一节中的手动解决方案一起使用或使用管道。
Generate manually the tree object content
手动生成树对象内容
To understand the tree generation process we can generate it ourselves starting from its human readable content, e.g. for the top tree:
为了理解树的生成过程,我们可以从人类可读的内容开始自己生成它,例如顶部树:
##代码##Each object ASCII SHA-1 hash is converted and stored in binary format. If what you need is just a binary version of the ASCII hashes, you can do it with:
每个对象的 ASCII SHA-1 哈希都被转换并以二进制格式存储。如果您需要的只是 ASCII 散列的二进制版本,您可以使用:
##代码##So the blob 887ae9333d92a1d72400c210546e28baa1050e44
is converted to
所以 blob887ae9333d92a1d72400c210546e28baa1050e44
被转换为
If we want to create the whole tree object, here is an awk one-liner:
如果我们想创建整个树对象,这里是一个 awk 单行:
##代码##The function bsha
converts the SHA-1 ASCII hashes to binaries. The tree content is first put into the variable t
and then its length is calculated and printed in the END{...}
section.
该函数bsha
将 SHA-1 ASCII 哈希值转换为二进制文件。首先将树的内容放入变量中t
,然后计算其长度并将其打印在END{...}
节中。
As observed above, the console is not very suitable for printing binaries, so we might want to replace them with their \x##
format equivalent:
如上所述,控制台不太适合打印二进制文件,因此我们可能希望将它们替换为\x##
等效的格式:
The output should be a good compromise for understanding the tree content structure. Compare the output above with the general tree content structure
输出应该是理解树内容结构的一个很好的折衷。将上面的输出与一般的树内容结构进行比较
##代码##where each Object Entry is like:
其中每个对象条目是这样的:
##代码##Modes are a subset of UNIX filesystem modes. See Tree Objectson Git manual for more details.
模式是 UNIX 文件系统模式的子集。有关更多详细信息,请参阅Git 手册上的树对象。
We need to make sure that the results are consistent. To this end, we might compare the checksum of the awk generated tree with the checksum of the Git stored tree.
我们需要确保结果是一致的。为此,我们可能会将 awk 生成树的校验和与 Git 存储树的校验和进行比较。
As for the latter:
至于后者:
##代码##As for the home made tree:
至于自制的树:
##代码##The checksum is the same.
校验和是一样的。
Calculate the tree object checksum
计算树对象校验和
The more or less official way to get it is:
或多或少的官方获取方式是:
##代码##To calculate it manually, we need to pipe the content of the script generated tree into the shasum
command. Actually we have already done this above (to compare the generated and stored content). The results was:
要手动计算它,我们需要将脚本生成的树的内容通过管道传输到shasum
命令中。实际上我们上面已经做了这个(比较生成和存储的内容)。结果是:
and is the same as with git mktree
.
并且与 相同 git mktree
。
Packed objects
打包物品
You might find that, for your repo, you are unable to find the files
.git/objects/XX/XXX...
storing the Git objects. This happens because some or all "loose" objects have been packed into one or more .git\objects\pack\*.pack
files.
您可能会发现,对于您的存储库,您无法找到.git/objects/XX/XXX...
存储 Git 对象的文件
。发生这种情况是因为一些或所有“松散”对象已被打包到一个或多个.git\objects\pack\*.pack
文件中。
To unpack the repo, first move the pack files away from their original position, then git-unpack the objects.
要解包 repo,首先将包文件从它们的原始位置移开,然后 git-unpack 对象。
##代码##To repack when you are done with experiments:
完成实验后重新打包:
##代码##回答by Greg Bacon
Expressed as a BNF-like pattern, a git tree contains data of the form
表示为类似 BNF 的模式,git 树包含以下形式的数据
##代码##That is, a git tree begins with a header of
也就是说,一个 git 树以一个头文件开始
- the literal string
tree
- SPACE (i.e.,the byte
0x20
) - ASCII-encoded decimal length of the uncompressed contents
- 文字串
tree
- SPACE(即字节
0x20
) - 未压缩内容的 ASCII 编码十进制长度
After a NUL (i.e., the byte 0x00
) terminator, the tree contains one or more entries of the form
在 NUL(即字节0x00
)终止符之后,树包含一个或多个形式的条目
- ASCII-encoded octal mode
- SPACE
- name
- NUL
- SHA1 hash encoded as 20 unsigned bytes
- ASCII 编码的八进制模式
- 空间
- 姓名
- 零
- SHA1 哈希编码为 20 个无符号字节
Git then feeds the tree data to zlib'sdeflate for compact storage.
然后 Git 将树数据提供给zlib 的deflate 以进行紧凑存储。
Remember that git blobs are anonymous. Git trees associate names with SHA1 hashes of other content that may be blobs, other trees, and so on.
请记住,git blob 是匿名的。Git 树将名称与其他内容的 SHA1 哈希值相关联,这些内容可能是 blob、其他树等。
To demonstrate, consider the tree associated with git's v2.7.2 tag, which you may want to browse on GitHub.
为了演示,请考虑与 git 的 v2.7.2 标记关联的树,您可能希望在 GitHub 上浏览它。
##代码##The code below requires the tree object to be in “loose” format. I do not know of a way to extract a single raw object from a packfile, so I first ran git unpack-objects
on the pack files from my clone to a new repository. Be aware that this expanded a .git
directory that began around 90 MB to result of some 1.8 GB.
下面的代码要求树对象为“松散”格式。我不知道从git unpack-objects
包文件中提取单个原始对象的方法,所以我首先将包文件从我的克隆运行到一个新的存储库。请注意,这将一个.git
从大约 90 MB 开始的目录扩展到大约 1.8 GB。
UPDATE:Thanks to max630 for showing how to unpack a single object.
更新:感谢 max630 展示如何解包单个对象。
##代码##Watch our poor man's git ls-tree
in action. The output is identical except that it outputs the tree
marker and length.
观看我们可怜的人的git ls-tree
行动。除了输出tree
标记和长度外,输出是相同的。
回答by Joe
As suggested, Pro Git explains the structure well. To show a tree pretty-printed, use:
正如建议的那样,Pro Git 很好地解释了结构。要显示漂亮打印的树,请使用:
##代码##to show the same tree in its raw, but uncompressed form, use:
要以其原始但未压缩的形式显示同一棵树,请使用:
##代码##The structure is essentially the same, with hashes stored as binary and null-terminated filenames.
结构基本相同,散列存储为二进制和以空字符结尾的文件名。
回答by Andrey
@lemiorhan answer is correct but misses small important detail. Tree format is:
@lemiorhan 答案是正确的,但遗漏了一些重要的细节。树格式为:
##代码##But what is important is that [SHA-1 of referencing blob or tree]
is in binary form, not in hex. This is Python snippet to parse tree object into entries:
但重要的是它[SHA-1 of referencing blob or tree]
是二进制形式,而不是十六进制。这是将树对象解析为条目的 Python 片段: