提高 git status 性能的方法
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/4994772/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Ways to improve git status performance
提问by Senthil A Kumar
I have a repo of 10 GB on a Linux machine which is on NFS. The first time git status
takes 36 minutes and subsequent git status
takes 8 minutes. Seems Git depends on the OS for caching files. Only the first git
commands like commit
, status
that involves pack/repack the whole repo takes a very long time for a huge repo. I am not sure if you have used git status
on such a large repo, but has anyone come across this issue?
我在 NFS 上的 Linux 机器上有一个 10 GB 的存储库。第一次git status
需要 36 分钟,后续git status
需要 8 分钟。似乎 Git 依赖于操作系统来缓存文件。只有git
像commit
,status
这样的第一个命令涉及打包/重新打包整个存储库需要很长时间才能获得一个巨大的存储库。我不确定您是否使用git status
过这么大的 repo,但是有人遇到过这个问题吗?
I have tried git gc
, git clean
, git repack
but the time taken is still/almost the same.
我试过git gc
, git clean
,git repack
但所用的时间仍然/几乎相同。
Will sub-modules or any other concepts like breaking the repo into smaller ones help? If so which is the best for splitting a larger repo. Is there any other way to improve time taken for git commands on a large repo?
子模块或任何其他概念(例如将 repo 分解为较小的)会有帮助吗?如果是这样,哪个最适合拆分更大的回购。有没有其他方法可以改善大型存储库上 git 命令所花费的时间?
采纳答案by Josh Lee
To be more precise, git depends on the efficiency of the lstat(2)
system call, so tweaking your client's “attribute cache timeout”might do the trick.
更准确地说,git 取决于lstat(2)
系统调用的效率,因此调整客户端的“属性缓存超时”可能会奏效。
The manual for git-update-index
— essentially a manual mode for git-status
— describes what you can do to alleviate this, by using the --assume-unchanged
flagto suppress its normal behavior and manually update the paths that you have changed. You might even program your editor to unset this flag every time you save a file.
手册git-update-index
- 本质上是一种手动模式git-status
- 描述了您可以通过使用--assume-unchanged
标志来抑制其正常行为并手动更新您更改的路径来缓解这种情况的方法。您甚至可以对编辑器进行编程以在每次保存文件时取消设置此标志。
The alternative, as you suggest, is to reduce the size of your checkout (the size of the packfiles doesn't really come into play here). The options are a sparse checkout, submodules, or Google's repotool.
正如您所建议的,另一种方法是减少结帐的大小(packfile 的大小在这里并没有真正发挥作用)。选项是稀疏结帐、子模块或 Google 的repo工具。
(There's a mailing list thread about using Git with NFS, but it doesn't answer many questions.)
(有一个关于使用 Git 和 NFS的邮件列表线程,但它没有回答很多问题。)
回答by user1077329
I'm also seeing this problem on a large project shared over NFS.
我也在一个通过 NFS 共享的大型项目中看到了这个问题。
It took me some time to discover the flag -unothat can be given to both git commit and git status.
我花了一些时间才发现可以同时赋予 git commit 和 git status的标志-uno。
What this flag does is to disable looking for untracked files. This reduces the number of nfs operations significantly. The reason is that in order for git to discover untracked files it has to look in all subdirectories so if you have many subdirectories this will hurt you. By disabling git from looking for untracked files you eliminate all these NFS operations.
此标志的作用是禁止查找未跟踪的文件。这显着减少了 nfs 操作的数量。原因是为了让 git 发现未跟踪的文件,它必须查看所有子目录,所以如果你有很多子目录,这会伤害你。通过禁用 git 查找未跟踪的文件,您可以消除所有这些 NFS 操作。
Combine this with the core.preloadindex flag and you can get resonable perfomance even on NFS.
将此与 core.preloadindex 标志相结合,即使在 NFS 上也可以获得合理的性能。
回答by Jabari
Try git gc. Also, git cleanmayhelp.
UPDATE- Not sure where the down vote came from, but the git manual specifically states:
更新- 不确定反对票的来源,但 git 手册特别指出:
Runs a number of housekeeping tasks within the current repository, such as compressing file revisions (to reduce disk space and increase performance) and removing unreachable objects which may have been created from prior invocations of git add.
Users are encouraged to run this task on a regular basis within each repository to maintain good disk space utilization and good operating performance.
在当前存储库中运行许多内务管理任务,例如压缩文件修订版(以减少磁盘空间并提高性能)和删除可能已从先前的 git add 调用中创建的无法访问的对象。
鼓励用户在每个存储库中定期运行此任务,以保持良好的磁盘空间利用率和良好的操作性能。
I always notice a difference after running git gc when git status is slow!
当 git status 很慢时,我总是在运行 git gc 后注意到不同!
UPDATE II- Not sure how I missed this, but the OP already tried git gc
and git clean
. I swear that wasn't originally there, but I don't see any changes in the edits. Sorry for that!
更新 II- 不知道我是如何错过这个的,但 OP 已经尝试过git gc
并且git clean
. 我发誓最初并不存在,但我没有看到编辑中有任何变化。对不起!
回答by beno
If your git repo makes heavy use of submodules, you can greatly speed up the performance of git status by editing the config file in the .git directory and setting ignore = dirty
on any particularly large/heavy submodules. For example:
如果你的 git repo 大量使用子模块,你可以通过编辑 .git 目录中的配置文件并设置ignore = dirty
任何特别大/重的子模块来大大加快 git status 的性能。例如:
[submodule "mysubmodule"]
url = ssh://mysubmoduleURL
ignore = dirty
You'll lose the convenience of a reminder that there are unstaged changes in any of the submodules that you may have forgotten about, but you'll still retain the main convenience of knowing when the submodules are out of sync with the main repo. Plus, you can still change your working directory to the submodule itself and use git status within it as per usual to see more information. See this questionfor more details about what "dirty" means.
您将失去提醒您可能已经忘记的任何子模块中存在未暂存更改的便利,但您仍将保留知道子模块何时与主存储库不同步的主要便利。另外,您仍然可以将工作目录更改为子模块本身,并像往常一样在其中使用 git status 来查看更多信息。有关“脏”的含义的更多详细信息,请参阅此问题。
回答by VonC
The performance of git status should improve with Git 2.13 (Q2 2017).
git status 的性能应该会随着 Git 2.13(2017 年第二季度)而提高。
See commit 950a234(14 Apr 2017) by Jeff Hostetler (jeffhostetler
).
(Merged by Junio C Hamano -- gitster
--in commit 8b6bba6, 24 Apr 2017)
请参阅Jeff Hostetler ( ) 的提交 950a234(2017 年 4 月 14 日)。(由Junio C Hamano合并-- --在8b6bba6 提交中,2017 年 4 月 24 日)jeffhostetler
gitster
> string-list
: use ALLOC_GROW
macrowhen reallocing string_list
> string-list
:重新分配时使用ALLOC_GROW
宏string_list
Use
ALLOC_GROW()
macro when reallocing astring_list
array rather than simply increasing it by 32.
This is a performance optimization.During status on a very large repo and there are many changes, a significant percentage of the total run time is spent reallocing the
wt_status.changes
array.This change decreases the time in
wt_status_collect_changes_worktree()
from 125 seconds to 45 seconds on my very large repository.
ALLOC_GROW()
在重新分配string_list
数组时使用宏,而不是简单地将其增加 32。
这是一种性能优化。在一个非常大的 repo 的状态期间,有很多变化,总运行时间的很大一部分用于重新分配
wt_status.changes
array。此更改将
wt_status_collect_changes_worktree()
我非常大的存储库中的时间从 125 秒减少到 45 秒。
Plus, Git 2.17 (Q2 2018) will introduce a new trace, for measuring where the time is spent in the index-heavy operations.
此外,Git 2.17(2018 年第二季度)将引入一个新的跟踪,用于测量索引密集型操作花费的时间。
See commit ca54d9b(27 Jan 2018) by Nguy?n Thái Ng?c Duy (pclouds
).
(Merged by Junio C Hamano -- gitster
--in commit 090dbea, 15 Feb 2018)
请参阅Nguy?n Thái Ng?c Duy ( ) 的commit ca54d9b(27 Jan 2018 )。(由Junio C Hamano合并-- --在commit 090dbea,2018 年 2 月 15 日)pclouds
gitster
trace
: measure where the time is spent in the index-heavy operationsAll the known heavy code blocks are measured (except object database access). This should help identify if an optimization is effective or not.
An unoptimized git-status would give something like below:
trace
: 衡量在索引密集型操作中花费的时间测量所有已知的重代码块(对象数据库访问除外)。这应该有助于确定优化是否有效。
未优化的 git-status 会给出如下内容:
0.001791141 s: read cache ...
0.004011363 s: preload index
0.000516161 s: refresh index
0.003139257 s: git command: ... 'status' '--porcelain=2'
0.006788129 s: diff-files
0.002090267 s: diff-index
0.001885735 s: initialize name hash
0.032013138 s: read directory
0.051781209 s: git command: './git' 'status'
The same Git 2.17 (Q2 2018) improves git status
with:
相同的 Git 2.17(2018 年第二季度)改进git status
了:
commit f39a757, commit 3ca1897, commit fd9b544, commit d7d1b49(09 Jan 2018) by Jeff Hostetler (
jeffhostetler
).
(Merged by Junio C Hamano --gitster
--in commit 4094e47, 08 Mar 2018)
"git status
" can spend a lot of cycles to compute the relation between the current branch and its upstream, which can now be disabled with "--no-ahead-behind
" option.commit ebbed3b(25 Feb 2018) by Derrick Stolee (
derrickstolee
).
提交 f39a757,提交 3ca1897,提交 fd9b544,提交 d7d1b49(2018 年 1 月 9 日)由Jeff Hostetler (
jeffhostetler
)。
(由Junio Cgitster
Hamano合并-- --在commit 4094e47,2018 年 3 月 8 日)
“git status
” 可以花费大量周期来计算当前分支与其上游之间的关系,现在可以使用“--no-ahead-behind
”选项禁用。由Derrick Stolee (
derrickstolee
)提交 ebbed3b(2018 年 2 月 25 日)。
revision.c
: reduce object database queriesIn
mark_parents_uninteresting()
, we check for the existence of an object file to see if we should treat a commit as parsed. The result is to set the "parsed" bit on the commit.Modify the condition to only check
has_object_file()
if the result would change the parsed bit.When a local branch is different from its upstream ref, "
git status
" will compute ahead/behind counts.
This usespaint_down_to_common()
and hitsmark_parents_uninteresting()
.On a copy of the Linux repo with a local instance of "master" behind the remote branch "
origin/master
" by ~60,000 commits, we find the performance of "git status
" went from 1.42 seconds to 1.32 seconds, for a relative difference of -7.0%.
revision.c
: 减少对象数据库查询在 中
mark_parents_uninteresting()
,我们检查目标文件是否存在以查看是否应该将提交视为已解析。结果是在提交时设置“解析”位。修改条件以仅检查
has_object_file()
结果是否会更改解析位。当本地分支与其上游引用不同时,“
git status
”将计算超前/滞后计数。
这使用paint_down_to_common()
和命中mark_parents_uninteresting()
。在远程分支“”后面有本地“master”实例的 Linux 存储库副本上,
origin/master
通过 ~60,000 次提交,我们发现“git status
”的性能从 1.42 秒变为 1.32 秒,相对差异为 -7.0%。
Git 2.24 (Q3 2019) proposes another setting to improve git status
performance:
Git 2.24(2019 年第三季度)提出了另一个提高git status
性能的设置:
See commit aaf633c, commit c6cc4c5, commit ad0fb65, commit 31b1de6, commit b068d9a, commit 7211b9e(13 Aug 2019) by Derrick Stolee (derrickstolee
).
(Merged by Junio C Hamano -- gitster
--in commit f4f8dfe, 09 Sep 2019)
请参阅Derrick Stolee ( ) 的commit aaf633c、commit c6cc4c5、commit ad0fb65、commit 31b1de6、commit b068d9a、commit 7211b9e(2019 年 8 月 13 日)。(由Junio C Hamano合并-- --在提交 f4f8dfe 中,2019 年 9 月 9 日)derrickstolee
gitster
repo-settings: create feature.manyFiles setting
The
feature.manyFiles
setting is suitable for repos with many files in the working directory.
By settingindex.version=4
andcore.untrackedCache=true
, commands such as 'git status
' should improve.
回购设置:创建 feature.manyFiles 设置
该
feature.manyFiles
设置适用于工作目录中有许多文件的存储库。
通过设置index.version=4
andcore.untrackedCache=true
,诸如“git status
”之类的命令应该会有所改进。
But:
但:
With Git 2.24 (Q4 2019), the codepath that reads the index.version
configuration was broken with a recent update, which has been corrected.
在 Git 2.24(2019 年第 4 季度)中,读取index.version
配置的代码路径在最近的更新中被破坏,该更新已得到纠正。
See commit c11e996(23 Oct 2019) by Derrick Stolee (derrickstolee
).
(Merged by Junio C Hamano -- gitster
--in commit 4d6fb2b, 24 Oct 2019)
请参阅Derrick Stolee ( )提交的 c11e996(2019 年 10 月 23 日)。(由Junio C Hamano合并-- --在提交 4d6fb2b 中,2019 年 10 月 24 日)derrickstolee
gitster
repo-settings
: read an int for index.versionSigned-off-by: Derrick Stolee
Several config options were combined into a
repo_settings
struct in ds/feature-macros, including a move of the "index.version" config setting in 7211b9e("repo-settings
: consolidate some config settings", 2019-08-13, Git v2.24.0-rc1 -- mergelisted in batch #0).Unfortunately, that file looked like a lot of boilerplate and what is clearly a factor of copy-paste overload, the config setting is parsed with
repo_config_ge_bool()
instead ofrepo_config_get_int()
. This means that a setting "index.version=4" would not register correctly and would revert to the default version of 3.I caught this while incorporating v2.24.0-rc0 into the VFS for Git codebase, where we really care that the index is in version 4.
This was not caught by the codebase because the version checks placed in
t1600-index.sh
did not test the "basic" scenario enough. Here, we modify the test to include these normal settings to not be overridden byfeatures.manyFiles
orGIT_INDEX_VERSION
.
While the "default" version is 3, this is demoted to version 2 indo_write_index()
when not necessary.
repo-settings
: 为 index.version 读取一个 int签字人:德里克·斯托利
几个配置选项
repo_settings
在 ds/feature-macros中组合成一个结构体,包括在7211b9e 中移动“index.version”配置设置(“repo-settings
:整合一些配置设置”,2019-08-13,Git v2.24.0-rc1 --合并在批次 #0 中列出)。不幸的是,该文件看起来像很多样板文件,显然是复制粘贴过载的一个因素,配置设置被解析
repo_config_ge_bool()
为repo_config_get_int()
. 这意味着设置“index.version=4”将无法正确注册,并且会恢复为默认版本 3。我在将 v2.24.0-rc0 合并到 Git 代码库的 VFS 中时发现了这一点,我们真正关心的是索引在版本 4 中。
这没有被代码库捕获,因为放置的版本检查
t1600-index.sh
没有充分测试“基本”场景。在这里,我们修改测试以包含这些正常设置,以免被features.manyFiles
或覆盖GIT_INDEX_VERSION
。
虽然“默认”版本是 3,但在do_write_index()
不需要时会降级到版本 2 。
回答by klimat
回答by citysurrounded
In our codebase where we have somewhere in the range of 20 - 30 submodules,git status --ignore-submodules
sped things up for me drastically. Do note that this will not report on the status of submodules.
在我们的代码库中,我们有 20 到 30 个子模块,git status --ignore-submodules
大大加快了我的工作速度。请注意,这不会报告子模块的状态。
回答by dCSeven
Something that hasn't been mentioned yet is, to activate the filesystem cache on windows machines (linux filesystems are completly different and git was optimized for them, therefore this probably only helps on windows).
尚未提及的事情是,在 Windows 机器上激活文件系统缓存(Linux 文件系统完全不同,并且 git 已针对它们进行了优化,因此这可能仅在 Windows 上有帮助)。
git config core.fscache true
作为最后的手段,如果 git 仍然很慢,可以关闭修改时间检查,即 git 需要找出哪些文件已更改。
git config core.ignoreStat true
BUT: Changed files have to be added afterwards by the dev himself with git add
. Git doesn't find changes itself.
但是:更改后的文件必须由开发人员自己添加git add
。Git 本身不会发现更改。
回答by nh2
Leftover index.lock
files
剩余index.lock
文件
git status
can be pathologically slow when you have leftover index.lock
files.
git status
当您有剩余index.lock
文件时,可能会病态地缓慢。
This happens especially when you have git submodules
, because then you often don't notice such lefterover files.
这尤其发生在您有 时git submodules
,因为您通常不会注意到此类剩余文件。
Summary: Run find .git/ -name index.lock
, and delete the leftover filesafter checking that they are indeed not used by any currently running program.
总结:运行find .git/ -name index.lock
,并在检查它们确实没有被任何当前正在运行的程序使用后删除剩余的文件。
Details
细节
I found that my shell git status was extremely slow in my repo, with git 2.19 on Ubuntu 16.04.
我发现我的 repo 中的 shell git status 非常慢,在 Ubuntu 16.04 上使用 git 2.19。
Dug in and found that /usr/bin/time git status
in my assets
git submodule took 1.7 seconds.
挖了进去,发现/usr/bin/time git status
在我的assets
git 子模块中花了 1.7 秒。
Found with strace
that git read all my big files in there with mmap
.
It doesn't usually do that, usually stat
is enough.
发现用strace
那个 git 读取了我所有的大文件mmap
。它通常不会这样做,通常stat
就足够了。
I googled the problem and found the Use of index and Racy Git problem.
我在 google 上搜索了这个问题,发现了Use of index 和 Racy Git 问题。
Tried git update-index somefile
(in my case gitignore
in the submodule checkout) shown herebut it failed with
尝试过git update-index somefile
(在我的情况下gitignore
在子模块结帐中)显示在此处,但失败了
fatal: Unable to create '/home/niklas/src/myproject/.git/modules/assets/index.lock': File exists.
Another git process seems to be running in this repository, e.g.
an editor opened by 'git commit'. Please make sure all processes
are terminated then try again. If it still fails, a git process
may have crashed in this repository earlier:
remove the file manually to continue.
This is a classical error. Usually you notice it at any git operation, but for submodules that you don't often commit to, you may not notice it for months, because it only appears when adding something to the index; the warning is not raised on read-only git status
.
这是一个经典的错误。通常你会在任何 git 操作中注意到它,但对于你不经常提交的子模块,你可能几个月都不会注意到它,因为它只有在向索引添加内容时才会出现;警告不会在 read-only 上引发git status
。
Removing the index.lock
file, git status
became fast immediately, mmaps
disappeared, and it's now over 1000x faster.
删除index.lock
文件,git status
立即变得很快,mmaps
消失了,现在速度提高了 1000 多倍。
So if your git status is unnaturally slow, check find .git/ -name index.lock
and delete the leftovers.
因此,如果您的 git status 异常缓慢,请检查find .git/ -name index.lock
并删除剩余部分。
回答by MS_
It is a pretty old question. Though, I am surprised that no one commented about binary file given the repository size.
这是一个很老的问题。尽管如此,我很惊讶没有人对考虑到存储库大小的二进制文件发表评论。
You mentioned that your git repo is ~10GB. It seems that apart from NFS issue and other git issues (resolvable by git gc
and git configuration change as outline in other answers), git commands (git status, git diff, git add) might be slow because of large number of binary file in the repository. git is not good at handling binary file. You can remove unnecessary binary file using following command (example is given for NetCDF file; have a backup of git repository before):
你提到你的 git repo 是 ~10GB。似乎除了 NFS 问题和其他 git 问题(可通过git gc
其他答案中的概述和 git 配置更改解决)之外,由于存储库中有大量二进制文件,git 命令(git status、git diff、git add)可能会很慢. git 不擅长处理二进制文件。您可以使用以下命令删除不必要的二进制文件(示例为 NetCDF 文件;之前有 git 存储库的备份):
git filter-branch --force --index-filter \
'git rm --cached --ignore-unmatch *.nc' \
--prune-empty --tag-name-filter cat -- --all
Do not forget to put '*.nc' to gitignore file to stop git from recommit the file.
不要忘记将 '*.nc' 放入 gitignore 文件以阻止 git 重新提交文件。