提高 git status 性能的方法

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/4994772/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-10 10:05:59  来源:igfitidea点击:

Ways to improve git status performance

performancegitnfs

提问by Senthil A Kumar

I have a repo of 10 GB on a Linux machine which is on NFS. The first time git statustakes 36 minutes and subsequent git statustakes 8 minutes. Seems Git depends on the OS for caching files. Only the first gitcommands like commit, statusthat involves pack/repack the whole repo takes a very long time for a huge repo. I am not sure if you have used git statuson such a large repo, but has anyone come across this issue?

我在 NFS 上的 Linux 机器上有一个 10 GB 的存储库。第一次git status需要 36 分钟,后续git status需要 8 分钟。似乎 Git 依赖于操作系统来缓存文件。只有gitcommit,status这样的第一个命令涉及打包/重新打包整个存储库需要很长时间才能获得一个巨大的存储库。我不确定您是否使用git status过这么大的 repo,但是有人遇到过这个问题吗?

I have tried git gc, git clean, git repackbut the time taken is still/almost the same.

我试过git gc, git cleangit repack但所用的时间仍然/几乎相同。

Will sub-modules or any other concepts like breaking the repo into smaller ones help? If so which is the best for splitting a larger repo. Is there any other way to improve time taken for git commands on a large repo?

子模块或任何其他概念(例如将 repo 分解为较小的)会有帮助吗?如果是这样,哪个最适合拆分更大的回购。有没有其他方法可以改善大型存储库上 git 命令所花费的时间?

采纳答案by Josh Lee

To be more precise, git depends on the efficiency of the lstat(2)system call, so tweaking your client's “attribute cache timeout”might do the trick.

更准确地说,git 取决于lstat(2)系统调用的效率,因此调整客户端的“属性缓存超时”可能会奏效。

The manual for git-update-index— essentially a manual mode for git-status— describes what you can do to alleviate this, by using the --assume-unchangedflagto suppress its normal behavior and manually update the paths that you have changed. You might even program your editor to unset this flag every time you save a file.

手册git-update-index- 本质上是一种手动模式git-status- 描述了您可以通过使用--assume-unchanged标志来抑制其正常行为并手动更新您更改的路径来缓解这种情况的方法。您甚至可以对编辑器进行编程以在每次保存文件时取消设置此标志。

The alternative, as you suggest, is to reduce the size of your checkout (the size of the packfiles doesn't really come into play here). The options are a sparse checkout, submodules, or Google's repotool.

正如您所建议的,另一种方法是减少结帐的大小(packfile 的大小在这里并没有真正发挥作用)。选项是稀疏结帐、子模块或 Google 的repo工具。

(There's a mailing list thread about using Git with NFS, but it doesn't answer many questions.)

(有一个关于使用 Git 和 NFS的邮件列表线程,但它没有回答很多问题。)

回答by user1077329

I'm also seeing this problem on a large project shared over NFS.

我也在一个通过 NFS 共享的大型项目中看到了这个问题。

It took me some time to discover the flag -unothat can be given to both git commit and git status.

我花了一些时间才发现可以同时赋予 git commit 和 git status的标志-uno

What this flag does is to disable looking for untracked files. This reduces the number of nfs operations significantly. The reason is that in order for git to discover untracked files it has to look in all subdirectories so if you have many subdirectories this will hurt you. By disabling git from looking for untracked files you eliminate all these NFS operations.

此标志的作用是禁止查找未跟踪的文件。这显着减少了 nfs 操作的数量。原因是为了让 git 发现未跟踪的文件,它必须查看所有子目录,所以如果你有很多子目录,这会伤害你。通过禁用 git 查找未跟踪的文件,您可以消除所有这些 NFS 操作。

Combine this with the core.preloadindex flag and you can get resonable perfomance even on NFS.

将此与 core.preloadindex 标志相结合,即使在 NFS 上也可以获得合理的性能。

回答by Jabari

Try git gc. Also, git cleanmayhelp.

试试git gc。此外,git clean可能会有所帮助。

UPDATE- Not sure where the down vote came from, but the git manual specifically states:

更新- 不确定反对票的来源,但 git 手册特别指出:

Runs a number of housekeeping tasks within the current repository, such as compressing file revisions (to reduce disk space and increase performance) and removing unreachable objects which may have been created from prior invocations of git add.

Users are encouraged to run this task on a regular basis within each repository to maintain good disk space utilization and good operating performance.

在当前存储库中运行许多内务管理任务,例如压缩文件修订版(以减少磁盘空间并提高性能)和删除可能已从先前的 git add 调用中创建的无法访问的对象。

鼓励用户在每个存储库中定期运行此任务,以保持良好的磁盘空间利用率和良好的操作性能。

I always notice a difference after running git gc when git status is slow!

当 git status 很慢时,我总是在运行 git gc 后注意到不同!

UPDATE II- Not sure how I missed this, but the OP already tried git gcand git clean. I swear that wasn't originally there, but I don't see any changes in the edits. Sorry for that!

更新 II- 不知道我是如何错过这个的,但 OP 已经尝试过git gc并且git clean. 我发誓最初并不存在,但我没有看到编辑中有任何变化。对不起!

回答by beno

If your git repo makes heavy use of submodules, you can greatly speed up the performance of git status by editing the config file in the .git directory and setting ignore = dirtyon any particularly large/heavy submodules. For example:

如果你的 git repo 大量使用子模块,你可以通过编辑 .git 目录中的配置文件并设置ignore = dirty任何特别大/重的子模块来大大加快 git status 的性能。例如:

[submodule "mysubmodule"]
url = ssh://mysubmoduleURL
ignore = dirty

You'll lose the convenience of a reminder that there are unstaged changes in any of the submodules that you may have forgotten about, but you'll still retain the main convenience of knowing when the submodules are out of sync with the main repo. Plus, you can still change your working directory to the submodule itself and use git status within it as per usual to see more information. See this questionfor more details about what "dirty" means.

您将失去提醒您可能已经忘记的任何子模块中存在未暂存更改的便利,但您仍将保留知道子模块何时与主存储库不同步的主要便利。另外,您仍然可以将工作目录更改为子模块本身,并像往常一样在其中使用 git status 来查看更多信息。有关“脏”的含义的更多详细信息,请参阅此问题

回答by VonC

The performance of git status should improve with Git 2.13 (Q2 2017).

git status 的性能应该会随着 Git 2.13(2017 年第二季度)而提高。

See commit 950a234(14 Apr 2017) by Jeff Hostetler (jeffhostetler).
(Merged by Junio C Hamano -- gitster--in commit 8b6bba6, 24 Apr 2017)

请参阅Jeff Hostetler ( ) 的提交 950a234(2017 年 4 月 14 日(由Junio C Hamano合并-- --8b6bba6 提交中,2017 年 4 月 24 日)jeffhostetler
gitster

> string-list: use ALLOC_GROWmacrowhen reallocing string_list

> string-list:重新分配时使用ALLOC_GROWstring_list

Use ALLOC_GROW()macro when reallocing a string_listarray rather than simply increasing it by 32.
This is a performance optimization.

During status on a very large repo and there are many changes, a significant percentage of the total run time is spent reallocing the wt_status.changesarray.

This change decreases the time in wt_status_collect_changes_worktree()from 125 seconds to 45 seconds on my very large repository.

ALLOC_GROW()在重新分配string_list数组时使用宏,而不是简单地将其增加 32。
这是一种性能优化。

在一个非常大的 repo 的状态期间,有很多变化,总运行时间的很大一部分用于重新分配wt_status.changesarray

此更改将wt_status_collect_changes_worktree()我非常大的存储库中的时间从 125 秒减少到 45 秒。



Plus, Git 2.17 (Q2 2018) will introduce a new trace, for measuring where the time is spent in the index-heavy operations.

此外,Git 2.17(2018 年第二季度)将引入一个新的跟踪,用于测量索引密集型操作花费的时间。

See commit ca54d9b(27 Jan 2018) by Nguy?n Thái Ng?c Duy (pclouds).
(Merged by Junio C Hamano -- gitster--in commit 090dbea, 15 Feb 2018)

请参阅Nguy?n Thái Ng?c Duy ( ) 的commit ca54d9b(27 Jan 2018 )(由Junio C Hamano合并-- --commit 090dbea,2018 年 2 月 15 日)pclouds
gitster

trace: measure where the time is spent in the index-heavy operations

All the known heavy code blocks are measured (except object database access). This should help identify if an optimization is effective or not.
An unoptimized git-status would give something like below:

trace: 衡量在索引密集型操作中花费的时间

测量所有已知的重代码块(对象数据库访问除外)。这应该有助于确定优化是否有效。
未优化的 git-status 会给出如下内容:

0.001791141 s: read cache ...
0.004011363 s: preload index
0.000516161 s: refresh index
0.003139257 s: git command: ... 'status' '--porcelain=2'
0.006788129 s: diff-files
0.002090267 s: diff-index
0.001885735 s: initialize name hash
0.032013138 s: read directory
0.051781209 s: git command: './git' 'status'

The same Git 2.17 (Q2 2018) improves git statuswith:

相同的 Git 2.17(2018 年第二季度)改进git status了:

revision.c: reduce object database queries

In mark_parents_uninteresting(), we check for the existence of an object file to see if we should treat a commit as parsed. The result is to set the "parsed" bit on the commit.

Modify the condition to only check has_object_file()if the result would change the parsed bit.

When a local branch is different from its upstream ref, "git status" will compute ahead/behind counts.
This uses paint_down_to_common()and hits mark_parents_uninteresting().

On a copy of the Linux repo with a local instance of "master" behind the remote branch "origin/master" by ~60,000 commits, we find the performance of "git status" went from 1.42 seconds to 1.32 seconds, for a relative difference of -7.0%.

revision.c: 减少对象数据库查询

在 中mark_parents_uninteresting(),我们检查目标文件是否存在以查看是否应该将提交视为已解析。结果是在提交时设置“解析”位。

修改条件以仅检查has_object_file()结果是否会更改解析位。

当本地分支与其上游引用不同时,“ git status”将计算超前/滞后计数。
这使用paint_down_to_common()和命中mark_parents_uninteresting()

在远程分支“”后面有本地“master”实例的 Linux 存储库副本上,origin/master通过 ~60,000 次提交,我们发现“ git status”的性能从 1.42 秒变为 1.32 秒,相对差异为 -7.0%。



Git 2.24 (Q3 2019) proposes another setting to improve git statusperformance:

Git 2.24(2019 年第三季度)提出了另一个提高git status性能的设置:

See commit aaf633c, commit c6cc4c5, commit ad0fb65, commit 31b1de6, commit b068d9a, commit 7211b9e(13 Aug 2019) by Derrick Stolee (derrickstolee).
(Merged by Junio C Hamano -- gitster--in commit f4f8dfe, 09 Sep 2019)

请参阅Derrick Stolee ( ) 的commit aaf633ccommit c6cc4c5commit ad0fb65commit 31b1de6commit b068d9acommit 7211b9e(2019 年 8 月 13 日(由Junio C Hamano合并-- --提交 f4f8dfe 中,2019 年 9 月 9 日)derrickstolee
gitster

repo-settings: create feature.manyFiles setting

The feature.manyFilessetting is suitable for repos with many files in the working directory.
By setting index.version=4and core.untrackedCache=true, commands such as 'git status' should improve.

回购设置:创建 feature.manyFiles 设置

feature.manyFiles设置适用于工作目录中有许多文件的存储库。
通过设置index.version=4and core.untrackedCache=true,诸如“ git status”之类的命令应该会有所改进。

But:

但:

With Git 2.24 (Q4 2019), the codepath that reads the index.versionconfiguration was broken with a recent update, which has been corrected.

在 Git 2.24(2019 年第 4 季度)中,读取index.version配置的代码路径在最近的更新中被破坏,该更新已得到纠正。

See commit c11e996(23 Oct 2019) by Derrick Stolee (derrickstolee).
(Merged by Junio C Hamano -- gitster--in commit 4d6fb2b, 24 Oct 2019)

请参阅Derrick Stolee ( )提交的 c11e996(2019 年 10 月 23 日)(由Junio C Hamano合并-- --提交 4d6fb2b 中,2019 年 10 月 24 日)derrickstolee
gitster

repo-settings: read an int for index.version

Signed-off-by: Derrick Stolee

Several config options were combined into a repo_settingsstruct in ds/feature-macros, including a move of the "index.version" config setting in 7211b9e("repo-settings: consolidate some config settings", 2019-08-13, Git v2.24.0-rc1 -- mergelisted in batch #0).

Unfortunately, that file looked like a lot of boilerplate and what is clearly a factor of copy-paste overload, the config setting is parsed with repo_config_ge_bool()instead of repo_config_get_int(). This means that a setting "index.version=4" would not register correctly and would revert to the default version of 3.

I caught this while incorporating v2.24.0-rc0 into the VFS for Git codebase, where we really care that the index is in version 4.

This was not caught by the codebase because the version checks placed in t1600-index.shdid not test the "basic" scenario enough. Here, we modify the test to include these normal settings to not be overridden by features.manyFilesor GIT_INDEX_VERSION.
While the "default" version is 3, this is demoted to version 2 in do_write_index()when not necessary.

repo-settings: 为 index.version 读取一个 int

签字人:德里克·斯托利

几个配置选项repo_settings在 ds/feature-macros中组合成一个结构体,包括在7211b9e 中移动“index.version”配置设置(“ repo-settings:整合一些配置设置”,2019-08-13,Git v2.24.0-rc1 --合并批次 #0 中列出)。

不幸的是,该文件看起来像很多样板文件,显然是复制粘贴过载的一个因素,配置设置被解析repo_config_ge_bool()repo_config_get_int(). 这意味着设置“index.version=4”将无法正确注册,并且会恢复为默认版本 3。

我在将 v2.24.0-rc0 合并到 Git 代码库的 VFS 中时发现了这一点,我们真正关心的是索引在版本 4 中。

这没有被代码库捕获,因为放置的版本检查t1600-index.sh没有充分测试“基本”场景。在这里,我们修改测试以包含这些正常设置,以免被features.manyFiles或覆盖GIT_INDEX_VERSION
虽然“默认”版本是 3,但在do_write_index()不需要时会降级到版本 2 。

回答by klimat

git config --global core.preloadIndex true

git config --global core.preloadIndex true

Did the job for me. Check the official documentation here.

为我做了这项工作。在此处查看官方文档。

回答by citysurrounded

In our codebase where we have somewhere in the range of 20 - 30 submodules,
git status --ignore-submodules
sped things up for me drastically. Do note that this will not report on the status of submodules.

在我们的代码库中,我们有 20 到 30 个子模块,
git status --ignore-submodules
大大加快了我的工作速度。请注意,这不会报告子模块的状态

回答by dCSeven

Something that hasn't been mentioned yet is, to activate the filesystem cache on windows machines (linux filesystems are completly different and git was optimized for them, therefore this probably only helps on windows).

尚未提及的事情是,在 Windows 机器上激活文件系统缓存(Linux 文件系统完全不同,并且 git 已针对它们进行了优化,因此这可能仅在 Windows 上有帮助)。

git config core.fscache true



作为最后的手段,如果 git 仍然很慢,可以关闭修改时间检查,即 git 需要找出哪些文件已更改。

git config core.ignoreStat true

BUT: Changed files have to be added afterwards by the dev himself with git add. Git doesn't find changes itself.

但是:更改后的文件必须由开发人员自己添加git add。Git 本身不会发现更改。

source

来源

回答by nh2

Leftover index.lockfiles

剩余index.lock文件

git statuscan be pathologically slow when you have leftover index.lockfiles.

git status当您有剩余index.lock文件时,可能会病态地缓慢。

This happens especially when you have git submodules, because then you often don't notice such lefterover files.

这尤其发生在您有 时git submodules,因为您通常不会注意到此类剩余文件。

Summary: Run find .git/ -name index.lock, and delete the leftover filesafter checking that they are indeed not used by any currently running program.

总结:运行find .git/ -name index.lock,并在检查它们确实没有被任何当前正在运行的程序使用后删除剩余的文件



Details

细节

I found that my shell git status was extremely slow in my repo, with git 2.19 on Ubuntu 16.04.

我发现我的 repo 中的 shell git status 非常慢,在 Ubuntu 16.04 上使用 git 2.19。

Dug in and found that /usr/bin/time git statusin my assetsgit submodule took 1.7 seconds.

挖了进去,发现/usr/bin/time git status在我的assetsgit 子模块中花了 1.7 秒。

Found with stracethat git read all my big files in there with mmap. It doesn't usually do that, usually statis enough.

发现用strace那个 git 读取了我所有的大文件mmap。它通常不会这样做,通常stat就足够了。

I googled the problem and found the Use of index and Racy Git problem.

我在 google 上搜索了这个问题,发现了Use of index 和 Racy Git 问题

Tried git update-index somefile(in my case gitignorein the submodule checkout) shown herebut it failed with

尝试过git update-index somefile(在我的情况下gitignore在子模块结帐中)显示在此处,但失败了

fatal: Unable to create '/home/niklas/src/myproject/.git/modules/assets/index.lock': File exists.

Another git process seems to be running in this repository, e.g.
an editor opened by 'git commit'. Please make sure all processes
are terminated then try again. If it still fails, a git process
may have crashed in this repository earlier:
remove the file manually to continue.

This is a classical error. Usually you notice it at any git operation, but for submodules that you don't often commit to, you may not notice it for months, because it only appears when adding something to the index; the warning is not raised on read-only git status.

这是一个经典的错误。通常你会在任何 git 操作中注意到它,但对于你不经常提交的子模块,你可能几个月都不会注意到它,因为它只有在向索引添加内容时才会出现;警告不会在 read-only 上引发git status

Removing the index.lockfile, git statusbecame fast immediately, mmapsdisappeared, and it's now over 1000x faster.

删除index.lock文件,git status立即变得很快,mmaps消失了,现在速度提高了 1000 多倍。

So if your git status is unnaturally slow, check find .git/ -name index.lockand delete the leftovers.

因此,如果您的 git status 异常缓慢,请检查find .git/ -name index.lock并删除剩余部分。

回答by MS_

It is a pretty old question. Though, I am surprised that no one commented about binary file given the repository size.

这是一个很老的问题。尽管如此,我很惊讶没有人对考虑到存储库大小的二进制文件发表评论。

You mentioned that your git repo is ~10GB. It seems that apart from NFS issue and other git issues (resolvable by git gcand git configuration change as outline in other answers), git commands (git status, git diff, git add) might be slow because of large number of binary file in the repository. git is not good at handling binary file. You can remove unnecessary binary file using following command (example is given for NetCDF file; have a backup of git repository before):

你提到你的 git repo 是 ~10GB。似乎除了 NFS 问题和其他 git 问题(可通过git gc其他答案中的概述和 git 配置更改解决)之外,由于存储库中有大量二进制文件,git 命令(git status、git diff、git add)可能会很慢. git 不擅长处理二进制文件。您可以使用以下命令删除不必要的二进制文件(示例为 NetCDF 文件;之前有 git 存储库的备份):

git filter-branch --force --index-filter \  
'git rm --cached --ignore-unmatch *.nc' \   
--prune-empty --tag-name-filter cat -- --all

Do not forget to put '*.nc' to gitignore file to stop git from recommit the file.

不要忘记将 '*.nc' 放入 gitignore 文件以阻止 git 重新提交文件。