git 如何检测文件被修改了?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1778862/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-19 03:57:27  来源:igfitidea点击:

How does git detect that a file has been modified?

git

提问by hdorio

How does git detect a file modification so fast?

git 是如何快速检测到文件修改的?

Does it hash every file in the repo and compare SHA1s? This would take a lot of time, wouldn't it?

它是否对存储库中的每个文件进行哈希处理并比较 SHA1?这会花费很多时间,不是吗?

Or does it compare atime, ctimeor mtime?

或者它比较atimectime或者mtime

回答by Tobu

Git tries hard to get convinced from the lstat() value alone that the worktree matches the index, because falling back on file contents is very expensive.

Git 努力仅从 lstat() 值中说服工作树与索引匹配,因为回退文件内容非常昂贵。

Documentation/technical/racy-git.txtdescribes what stat fields are used, and how some race conditions due to low mtime granularity are avoided. This article has some more detail.

文档/技术/racy-git.txt描述了使用哪些统计字段,以及如何避免由于低 mtime 粒度导致的一些竞争条件。这篇文章有一些更详细的信息

stat values aren't tamper-proof, see futimens(3). Git may be fooled into missing a change to a file; that does not compromise the integrity of content-hashing.

stat 值不能防篡改,请参阅 futimens(3)。Git 可能会被愚弄而错过了对文件的更改;这不会损害内容散列的完整性。

回答by Randal Schwartz

There's an initial mtime check for reports like "git status", but when the final commit is computed, mtimes don't matter... it's the SHA1 that matters.

对“git status”等报告有一个初始 mtime 检查,但是当计算最终提交时,mtimes 并不重要……重要的是 SHA1。

回答by jkp

Well I would hazard a guess that it's using a combination of stat()calls to work out what looks like it might have changed, then in turn actually tying to ascertain using it's diff'ing engine that this is the case.

好吧,我会冒险猜测它正在使用stat()调用的组合来计算出它可能发生了什么变化,然后实际上使用它的差异引擎来确定情况确实如此。

You can see the code for the diff engine hereto get some idea. I traced through the codebase to be sure that the status command does indeed call down into this code (it looks like a lot of stuff does!) and actually all this makes a lot of sense when you know that Git performs pretty badly on Windows where it is using an emulation layer to perform these POSIX type calls: it's an order of magnitude slower to do a git statuson that platform.

您可以在此处查看 diff 引擎的代码以获得一些想法。我跟踪了代码库以确保 status 命令确实调用了这段代码(看起来很多东西都是这样!)实际上,当您知道 Git 在 Windows 上执行得非常糟糕时,所有这些都很有意义它使用仿真层来执行这些 POSIX 类型调用:git status在该平台上执行 a 的速度要慢一个数量级。

Anyway, short of reading all the code from top to bottom (which I may later if I have time!) thats as far as I can take you for now...maybe someone can be more definitive if they have worked with the codebase.

无论如何,没有从上到下阅读所有代码(如果我有时间,我以后可能会阅读!)这就是我现在可以带你的......如果他们使用过代码库,也许有人可以更确定。

Note: another possible speedup comes from judicious use of inlinefunctions where it clearly makes sense, you can see this clearly in the headers.

注意:另一个可能的加速来自明智地使用inline明显有意义的函数,您可以在标题中清楚地看到这一点。

[edit: see herefor an explanation of stat()]

[编辑:见这里解释stat()]

回答by Max A.

Depending on platform, you should be able to find out what syscalls Git uses to figure out its status. Try strace git statuson Linux, truss git statuson SunOS, or the seemingly DTrace-based tool that Apple ships with its Developer Tools on Mac OS X.

根据平台的不同,您应该能够找出 Git 用来确定其状态的系统调用。strace git status在 Linux、truss git statusSunOS 或 Apple 随 Mac OS X 上的开发人员工具一起提供的看似基于 DTrace 的工具上尝试。