git 如何计算文件哈希？

Question

提问by netvope

The SHA1 hashes stored in the tree objects (as returned by git ls-tree) do not match the SHA1 hashes of the file content (as returned by sha1sum)

存储在树对象git ls-tree中的 SHA1 哈希值（由返回sha1sum）与文件内容的 SHA1 哈希值（由返回）不匹配

$ git cat-file blob 4716ca912495c805b94a88ef6dc3fb4aff46bf3c | sha1sum
de20247992af0f949ae8df4fa9a37e4a03d7063e  -

How does git compute file hashes? Does it compress the content before computing the hash?

git 如何计算文件哈希？它是否在计算散列之前压缩内容？

Answer 1

采纳答案by Leif Gruenwoldt

Git prefixes the object with "blob ", followed by the length (as a human-readable integer), followed by a NUL character

Git 用“blob”作为对象的前缀，然后是长度（作为人类可读的整数），然后是 NUL 字符

$ echo -e 'blob 14\0Hello, World!' | shasum 8ab686eafeb1f44702738c8b0f24f2567c36da6d

Source: http://alblue.bandlem.com/2011/08/git-tip-of-week-objects.html

来源：http: //alblue.bandlem.com/2011/08/git-tip-of-week-objects.html

Answer 2

回答by Lordbalmon

I am only expanding on the answer by @Leif Gruenwoldtand detailing what is in the referenceprovided by @Leif Gruenwoldt

我只是通过以下方式扩展答案@Leif Gruenwoldt并详细说明提供的参考资料中的内容@Leif Gruenwoldt

Do It Yourself..

自己做..

Step 1. Create an empty text document (name does not matter) in your repository
Step 2. Stage and Commit the document
Step 3. Identify the hash of the blob by executing git ls-tree HEAD
Step 4. Find the blob's hash to be e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
Step 5. Snap out of your surprise and read below

步骤 1. 在您的存储库中创建一个空文本文档（名称无关紧要）
步骤 2. 暂存并提交文档
步骤 3. 通过执行识别 blob 的哈希 git ls-tree HEAD
步骤 4. 找到 blob 的哈希值 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
第 5 步。惊魂未定并阅读以下内容

How does GIT compute its commit hashes

GIT 如何计算其提交哈希

    Commit Hash (SHA1) = SHA1("blob " + <size_of_file> + "git-hash-object () { # substitute when the `git` command is not available
    local type=blob
    [ "" = "-t" ] && shift && type= && shift
    # depending on eol/autocrlf settings, you may want to substitute CRLFs by LFs
    # by using `perl -pe 's/\r$//g'` instead of `cat` in the next 2 commands
    local size=$(cat  | wc -c | sed 's/ .*$//')
    ( echo -en "$type $size$ echo 'Hello, World!' > test.txt
$ git hash-object test.txt
8ab686eafeb1f44702738c8b0f24f2567c36da6d
$ git-hash-object test.txt
8ab686eafeb1f44702738c8b0f24f2567c36da6d
"; cat "" ) | sha1sum | sed 's/ .*$//'
}
" + <contents_of_file>)

The text blob?is a constant prefix and \0is also constant and is the NULLcharacter. The <size_of_file>and <contents_of_file>vary depending on the file.

文本blob?是一个常量前缀，\0也是常量并且是NULL字符。在<size_of_file>和<contents_of_file>取决于该文件。

See: What is the file format of a git commit object?

请参阅：git commit 对象的文件格式是什么？

And thats all folks!

这就是所有人！

But wait!, did you notice that the <filename>is not a parameter used for the hash computation? Two files could potentially have the same hash if their contents are same indifferent of the date and time they were created and their name. This is one of the reasons Git handles moves and renames better than other version control systems.

可是等等！，您是否注意到<filename>不是用于哈希计算的参数？如果两个文件的内容与它们的创建日期和时间以及名称无关，则它们可能具有相同的哈希值。这是 Git 比其他版本控制系统更好地处理移动和重命名的原因之一。

Do It Yourself (Ext)

自己动手（扩展）

Step 6. Create another empty file with a different filenamein the same directory
Step 7. Compare the hashes of both your files.

步骤 6.filename在同一目录中创建另一个不同的空文件
步骤 7. 比较两个文件的哈希值。

Note:

笔记：

The link does not mention how the treeobject is hashed. I am not certain of the algorithm and parameters however from my observation it probably computes a hash based on all the blobsand trees(their hashes probably) it contains

该链接没有提到tree对象是如何散列的。我不确定算法和参数，但是根据我的观察，它可能会根据它包含的所有blobs和trees（可能是它们的散列）计算散列

Answer 3

回答by Lucas Cimon

Based on Leif Gruenwoldtanswer, here is a shell function substitute to git hash-object:

根据Leif Gruenwoldt 的回答，这里有一个 shell 函数替代git hash-object：

def git_blob_hash(data):
    if isinstance(data, str):
        data = data.encode()
    data = b'blob ' + str(len(data)).encode() + b'##代码##' + data
    h = hashlib.sha1()
    h.update(data)
    return h.hexdigest()

Test:

测试：

##代码##

Answer 4

回答by Samuel Harmer

I needed this for some unit tests in Python 3 so thought I'd leave it here.

我需要在 Python 3 中进行一些单元测试，所以我想我会把它留在这里。

##代码##

I stick to \nline endings everywhere but in some circumstances Git might also be changing your line endingsbefore calculating this hash so you may need a .replace('\r\n', '\n')in there too.

我\n到处都坚持行尾，但在某些情况下，Git 也可能在计算这个散列之前改变你的行尾，所以你可能也需要一个.replace('\r\n', '\n')。

git 如何计算文件哈希？

提问by netvope

采纳答案by Leif Gruenwoldt

回答by Lordbalmon

回答by Lucas Cimon

回答by Samuel Harmer

相关推荐

最近更新

标签

git 如何计算文件哈希？

提问by netvope

采纳答案by Leif Gruenwoldt

回答by Lordbalmon

回答by Lucas Cimon

回答by Samuel Harmer

相关推荐

git 如何在不删除内容的情况下取消暂存大量文件

git 从 GitHub 存储库下载单个文件夹或目录

git 使用终端和 MacFusion 时出现 GIT_DISCOVERY_ACROSS_FILESYSTEM 问题

Git 中子项目和子模块的区别？

相关推荐

最近更新

标签