git 如何计算文件哈希?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/7225313/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-10 11:42:54  来源:igfitidea点击:

How does git compute file hashes?

githashsha1checksumgit-hash

提问by netvope

The SHA1 hashes stored in the tree objects (as returned by git ls-tree) do not match the SHA1 hashes of the file content (as returned by sha1sum)

存储在树对象git ls-tree中的 SHA1 哈希值(由 返回sha1sum)与文件内容的 SHA1 哈希值(由 返回)不匹配

$ git cat-file blob 4716ca912495c805b94a88ef6dc3fb4aff46bf3c | sha1sum
de20247992af0f949ae8df4fa9a37e4a03d7063e  -

How does git compute file hashes? Does it compress the content before computing the hash?

git 如何计算文件哈希?它是否在计算散列之前压缩内容?

采纳答案by Leif Gruenwoldt

Git prefixes the object with "blob ", followed by the length (as a human-readable integer), followed by a NUL character

Git 用“blob”作为对象的前缀,然后是长度(作为人类可读的整数),然后是 NUL 字符

$ echo -e 'blob 14\0Hello, World!' | shasum 8ab686eafeb1f44702738c8b0f24f2567c36da6d

$ echo -e 'blob 14\0Hello, World!' | shasum 8ab686eafeb1f44702738c8b0f24f2567c36da6d

Source: http://alblue.bandlem.com/2011/08/git-tip-of-week-objects.html

来源:http: //alblue.bandlem.com/2011/08/git-tip-of-week-objects.html

回答by Lordbalmon

I am only expanding on the answer by @Leif Gruenwoldtand detailing what is in the referenceprovided by @Leif Gruenwoldt

我只是通过以下方式扩展答案@Leif Gruenwoldt并详细说明提供的参考资料中的内容@Leif Gruenwoldt

Do It Yourself..

自己做..

  • Step 1. Create an empty text document (name does not matter) in your repository
  • Step 2. Stage and Commit the document
  • Step 3. Identify the hash of the blob by executing git ls-tree HEAD
  • Step 4. Find the blob's hash to be e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
  • Step 5. Snap out of your surprise and read below
  • 步骤 1. 在您的存储库中创建一个空文本文档(名称无关紧要)
  • 步骤 2. 暂存并提交文档
  • 步骤 3. 通过执行识别 blob 的哈希 git ls-tree HEAD
  • 步骤 4. 找到 blob 的哈希值 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
  • 第 5 步。 惊魂未定并阅读以下内容

How does GIT compute its commit hashes

GIT 如何计算其提交哈希

    Commit Hash (SHA1) = SHA1("blob " + <size_of_file> + "
git-hash-object () { # substitute when the `git` command is not available
    local type=blob
    [ "" = "-t" ] && shift && type= && shift
    # depending on eol/autocrlf settings, you may want to substitute CRLFs by LFs
    # by using `perl -pe 's/\r$//g'` instead of `cat` in the next 2 commands
    local size=$(cat  | wc -c | sed 's/ .*$//')
    ( echo -en "$type $size
$ echo 'Hello, World!' > test.txt
$ git hash-object test.txt
8ab686eafeb1f44702738c8b0f24f2567c36da6d
$ git-hash-object test.txt
8ab686eafeb1f44702738c8b0f24f2567c36da6d
"; cat "" ) | sha1sum | sed 's/ .*$//' }
" + <contents_of_file>)

The text blob?is a constant prefix and \0is also constant and is the NULLcharacter. The <size_of_file>and <contents_of_file>vary depending on the file.

文本blob?是一个常量前缀,\0也是常量并且是NULL字符。在<size_of_file><contents_of_file>取决于该文件。

See: What is the file format of a git commit object?

请参阅:git commit 对象的文件格式是什么?

And thats all folks!

这就是所有人!

But wait!, did you notice that the <filename>is not a parameter used for the hash computation? Two files could potentially have the same hash if their contents are same indifferent of the date and time they were created and their name. This is one of the reasons Git handles moves and renames better than other version control systems.

可是等等!,您是否注意到<filename>不是用于哈希计算的参数?如果两个文件的内容与它们的创建日期和时间以及名称无关,则它们可能具有相同的哈希值。这是 Git 比其他版本控制系统更好地处理移动和重命名的原因之一。

Do It Yourself (Ext)

自己动手(扩展)

  • Step 6. Create another empty file with a different filenamein the same directory
  • Step 7. Compare the hashes of both your files.
  • 步骤 6.filename在同一目录中创建另一个不同的空文件
  • 步骤 7. 比较两个文件的哈希值。

Note:

笔记:

The link does not mention how the treeobject is hashed. I am not certain of the algorithm and parameters however from my observation it probably computes a hash based on all the blobsand trees(their hashes probably) it contains

该链接没有提到tree对象是如何散列的。我不确定算法和参数,但是根据我的观察,它可能会根据它包含的所有blobstrees(可能是它们的散列)计算散列

回答by Lucas Cimon

Based on Leif Gruenwoldtanswer, here is a shell function substitute to git hash-object:

根据Leif Gruenwoldt 的回答,这里有一个 shell 函数替代git hash-object

def git_blob_hash(data):
    if isinstance(data, str):
        data = data.encode()
    data = b'blob ' + str(len(data)).encode() + b'##代码##' + data
    h = hashlib.sha1()
    h.update(data)
    return h.hexdigest()

Test:

测试:

##代码##

回答by Samuel Harmer

I needed this for some unit tests in Python 3 so thought I'd leave it here.

我需要在 Python 3 中进行一些单元测试,所以我想我会把它留在这里。

##代码##

I stick to \nline endings everywhere but in some circumstances Git might also be changing your line endingsbefore calculating this hash so you may need a .replace('\r\n', '\n')in there too.

\n到处都坚持行尾,但在某些情况下,Git 也可能在计算这个散列之前改变你的行尾,所以你可能也需要一个.replace('\r\n', '\n')