git 如何计算文件哈希?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/7225313/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How does git compute file hashes?
提问by netvope
The SHA1 hashes stored in the tree objects (as returned by git ls-tree
) do not match the SHA1 hashes of the file content (as returned by sha1sum
)
存储在树对象git ls-tree
中的 SHA1 哈希值(由 返回sha1sum
)与文件内容的 SHA1 哈希值(由 返回)不匹配
$ git cat-file blob 4716ca912495c805b94a88ef6dc3fb4aff46bf3c | sha1sum
de20247992af0f949ae8df4fa9a37e4a03d7063e -
How does git compute file hashes? Does it compress the content before computing the hash?
git 如何计算文件哈希?它是否在计算散列之前压缩内容?
采纳答案by Leif Gruenwoldt
Git prefixes the object with "blob ", followed by the length (as a human-readable integer), followed by a NUL character
Git 用“blob”作为对象的前缀,然后是长度(作为人类可读的整数),然后是 NUL 字符
$ echo -e 'blob 14\0Hello, World!' | shasum
8ab686eafeb1f44702738c8b0f24f2567c36da6d
$ echo -e 'blob 14\0Hello, World!' | shasum
8ab686eafeb1f44702738c8b0f24f2567c36da6d
Source: http://alblue.bandlem.com/2011/08/git-tip-of-week-objects.html
来源:http: //alblue.bandlem.com/2011/08/git-tip-of-week-objects.html
回答by Lordbalmon
I am only expanding on the answer by @Leif Gruenwoldt
and detailing what is in the referenceprovided by @Leif Gruenwoldt
我只是通过以下方式扩展答案@Leif Gruenwoldt
并详细说明提供的参考资料中的内容@Leif Gruenwoldt
Do It Yourself..
自己做..
- Step 1. Create an empty text document (name does not matter) in your repository
- Step 2. Stage and Commit the document
- Step 3. Identify the hash of the blob by executing
git ls-tree HEAD
- Step 4. Find the blob's hash to be
e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
- Step 5. Snap out of your surprise and read below
- 步骤 1. 在您的存储库中创建一个空文本文档(名称无关紧要)
- 步骤 2. 暂存并提交文档
- 步骤 3. 通过执行识别 blob 的哈希
git ls-tree HEAD
- 步骤 4. 找到 blob 的哈希值
e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
- 第 5 步。 惊魂未定并阅读以下内容
How does GIT compute its commit hashes
GIT 如何计算其提交哈希
Commit Hash (SHA1) = SHA1("blob " + <size_of_file> + "git-hash-object () { # substitute when the `git` command is not available
local type=blob
[ "" = "-t" ] && shift && type= && shift
# depending on eol/autocrlf settings, you may want to substitute CRLFs by LFs
# by using `perl -pe 's/\r$//g'` instead of `cat` in the next 2 commands
local size=$(cat | wc -c | sed 's/ .*$//')
( echo -en "$type $size$ echo 'Hello, World!' > test.txt
$ git hash-object test.txt
8ab686eafeb1f44702738c8b0f24f2567c36da6d
$ git-hash-object test.txt
8ab686eafeb1f44702738c8b0f24f2567c36da6d
"; cat "" ) | sha1sum | sed 's/ .*$//'
}
" + <contents_of_file>)
The text blob?
is a constant prefix and \0
is also constant and is the NULL
character. The <size_of_file>
and <contents_of_file>
vary depending on the file.
文本blob?
是一个常量前缀,\0
也是常量并且是NULL
字符。在<size_of_file>
和<contents_of_file>
取决于该文件。
See: What is the file format of a git commit object?
And thats all folks!
这就是所有人!
But wait!, did you notice that the <filename>
is not a parameter used for the hash computation? Two files could potentially have the same hash if their contents are same indifferent of the date and time they were created and their name. This is one of the reasons Git handles moves and renames better than other version control systems.
可是等等!,您是否注意到<filename>
不是用于哈希计算的参数?如果两个文件的内容与它们的创建日期和时间以及名称无关,则它们可能具有相同的哈希值。这是 Git 比其他版本控制系统更好地处理移动和重命名的原因之一。
Do It Yourself (Ext)
自己动手(扩展)
- Step 6. Create another empty file with a different
filename
in the same directory- Step 7. Compare the hashes of both your files.
- 步骤 6.
filename
在同一目录中创建另一个不同的空文件- 步骤 7. 比较两个文件的哈希值。
Note:
笔记:
The link does not mention how the tree
object is hashed. I am not certain of the algorithm and parameters however from my observation it probably computes a hash based on all the blobs
and trees
(their hashes probably) it contains
该链接没有提到tree
对象是如何散列的。我不确定算法和参数,但是根据我的观察,它可能会根据它包含的所有blobs
和trees
(可能是它们的散列)计算散列
回答by Lucas Cimon
Based on Leif Gruenwoldtanswer, here is a shell function substitute to git hash-object
:
根据Leif Gruenwoldt 的回答,这里有一个 shell 函数替代git hash-object
:
def git_blob_hash(data):
if isinstance(data, str):
data = data.encode()
data = b'blob ' + str(len(data)).encode() + b'##代码##' + data
h = hashlib.sha1()
h.update(data)
return h.hexdigest()
Test:
测试:
##代码##回答by Samuel Harmer
I needed this for some unit tests in Python 3 so thought I'd leave it here.
我需要在 Python 3 中进行一些单元测试,所以我想我会把它留在这里。
##代码##I stick to \n
line endings everywhere but in some circumstances Git might also be changing your line endingsbefore calculating this hash so you may need a .replace('\r\n', '\n')
in there too.
我\n
到处都坚持行尾,但在某些情况下,Git 也可能在计算这个散列之前改变你的行尾,所以你可能也需要一个.replace('\r\n', '\n')
。