我需要在裸仓库上运行 git gc 吗?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3532740/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-19 04:30:46  来源:igfitidea点击:

Do I ever need to run git gc on a bare repo?

gitgit-gc

提问by Mark Rushakoff

man git-gcdoesn't have an obvious answer in it, and I haven't had any luck with Google either (although I might have just been using the wrong search terms).

man git-gc里面没有明显的答案,而且我对 Google 也没有任何运气(尽管我可能只是使用了错误的搜索词)。

I understand that you should occasionally run git gcon a local repository to prune dangling objects and compress history, among other things -- but is a shared bare repository susceptible to these same issues?

我知道您应该偶尔git gc在本地存储库上运行以修剪悬空对象和压缩历史记录等等——但是共享的裸存储库是否容易受到这些相同问题的影响?

If it matters, our workflow is multiple developers pulling from and pushing to a bare repository on a shared network drive. The "central" repository was created with git init --bare --shared.

如果重要的话,我们的工作流程是多个开发人员从共享网络驱动器上的裸存储库中拉取和推送。“中央”存储库是使用git init --bare --shared.

采纳答案by Mark Rushakoff

As Jefromicommented on Dan's answer, git gcshouldbe called automatically called during "normal" use of a bare repository.

正如JefromiDan 的回答所评论的那样git gc应该在“正常”使用裸存储库期间自动调用。

I just ran git gc --aggressiveon two bare, shared repositories that have been actively used; one with about 38 commits the past 3-4 weeks, and the other with about 488 commits over roughly 3 months. Nobody has manually run git gcon either repository.

我刚刚运行git gc --aggressive了两个已被积极使用的裸露的共享存储库;一个在过去 3-4 周内提交了大约 38 个提交,另一个在大约 3 个月内提交了大约 488 个。没有人git gc在任一存储库上手动运行。

Smaller repository

较小的存​​储库

$ git count-objects
333 objects, 595 kilobytes

$ git count-objects -v
count: 333
size: 595
in-pack: 0
packs: 0
size-pack: 0
prune-packable: 0
garbage: 0

$ git gc --aggressive
Counting objects: 325, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (323/323), done.
Writing objects: 100% (325/325), done.
Total 325 (delta 209), reused 0 (delta 0)
Removing duplicate objects: 100% (256/256), done.

$ git count-objects -v
count: 8
size: 6
in-pack: 325
packs: 1
size-pack: 324
prune-packable: 0
garbage: 0

$ git count-objects
8 objects, 6 kilobytes

Larger repository

更大的存储库

$ git count-objects
4315 objects, 11483 kilobytes

$ git count-objects -v
count: 4315
size: 11483
in-pack: 9778
packs: 20
size-pack: 15726
prune-packable: 1395
garbage: 0

$ git gc --aggressive
Counting objects: 8548, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (8468/8468), done.
Writing objects: 100% (8548/8548), done.
Total 8548 (delta 7007), reused 0 (delta 0)
Removing duplicate objects: 100% (256/256), done.

$ git count-objects -v
count: 0
size: 0
in-pack: 8548
packs: 1
size-pack: 8937
prune-packable: 0
garbage: 0

$ git count-objects
0 objects, 0 kilobytes

I wish I had thought of it before I gced these two repositories, but I should have run git gcwithoutthe --aggressiveoption to see the difference. Luckily I have a medium-sized active repository left to test (164 commits over nearly 2 months).

我希望我已经想到这一点之前,我gc编这两个仓库,但我应该跑git gc,而不--aggressive看出区别选项。幸运的是,我有一个中等大小的活动存储库要测试(近 2 个月内提交了 164 次)。

$ git count-objects -v
count: 1279
size: 1574
in-pack: 2078
packs: 6
size-pack: 2080
prune-packable: 607
garbage: 0

$ git gc
Counting objects: 1772, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (1073/1073), done.
Writing objects: 100% (1772/1772), done.
Total 1772 (delta 1210), reused 1050 (delta 669)
Removing duplicate objects: 100% (256/256), done.

$ git count-objects -v
count: 0
size: 0
in-pack: 1772
packs: 1
size-pack: 1092
prune-packable: 0
garbage: 0

$ git gc --aggressive
Counting objects: 1772, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (1742/1742), done.
Writing objects: 100% (1772/1772), done.
Total 1772 (delta 1249), reused 0 (delta 0)

$ git count-objects -v
count: 0
size: 0
in-pack: 1772
packs: 1
size-pack: 1058
prune-packable: 0
garbage: 0

Running git gcclearly made a large dent in count-objects, even though we regularly pushto and fetchfrom this repository. But upon reading the manpage for git config, I noticed that the default loose object limit is 6700, which we apparently had not yet reached.

运行git gc明确提出在一个大的凹痕count-objects,尽管我们经常push要和fetch从该存储库。但是在阅读的联机帮助页时git config,我注意到默认的松散对象限制是 6700,我们显然还没有达到。

So it appears that the conclusion is no, you don't needto run git gcmanually on a bare repo;*but with the default setting for gc.auto, it might be a long time before garbage collection occurs automatically.

因此,看来,结论是没有,你并不需要运行git gc在裸回购手动; *但是对于 的默认设置gc.auto,可能需要很长时间才能自动进行垃圾收集。



*Generally, you shouldn't need to run git gc. But sometimes you might be strapped for spaceand you should run git gcmanually or set gc.autoto a lower value. My case for the question was simple curiosity, though.

*通常,您不需要运行git gc. 但有时您可能因空间不足而应git gc手动运行或设置gc.auto为较低的值。不过,我提出这个问题的理由很简单。

回答by Dan Moulding

From the git-gcman page:

git-gc手册页:

Users are encouraged to run this task on a regular basis within each repositoryto maintain good disk space utilization and good operating performance.

鼓励用户在每个存储库中定期运行此任务,以保持良好的磁盘空间利用率和良好的操作性能。

Emphasis mine. Bare repositories are repositories too!

强调我的。裸仓库也是仓库!

Further explanation: one of the housekeeping tasks that git-gcperforms is packingand repackingof loose objects. Even if you never have any danglingobjects in your bare repository, you will -- over time -- accumulate lots of loose objects. These loose objects should periodically get packed, for efficiency. Similarly, if a large number of packs accumulate, they should periodically get repacked into larger (fewer) packs.

进一步说明:执行的一项内务管理任务git-gc打包重新打包松散的物品。即使您的裸存储库中从未有任何悬空对象,随着时间的推移,您也会积累大量松散对象。为了提高效率,这些松散的物体应该定期打包。同样,如果积累了大量的包装,它们应该定期重新包装成更大(更少)的包装。

回答by VonC

The issue with git gc --autois that it can be blocking.

问题git gc --auto在于它可能会阻塞。

But with the new (Git 2.0 Q2 2014) setting gc.autodetach, you now can do it without any interruption:

但是使用新的 (Git 2.0 Q2 2014) 设置gc.autodetach,您现在可以无中断地进行操作:

See commit 4c4ac4dand commit 9f673f9(Nguy?n Thái Ng?c Duy, aka pclouds):

请参阅提交 4c4ac4d提交 9f673f9Nguy?n Thái Ng?c Duy,又名 pclouds):

gc --autotakes time and can block the user temporarily (but not any less annoyingly).
Make it run in background on systems that support it.
The only thing lost with running in background is printouts. But gc outputis not really interesting.
You can keep it in foreground by changing gc.autodetach.

gc --auto需要时间并且可以暂时阻止用户(但同样令人讨厌)。
让它在支持它的系统上在后台运行。
在后台运行唯一丢失的是打印输出。但gc output并不是很有趣。
您可以通过更改将其保持在前台gc.autodetach



Note: only git 2.7 (Q4 2015) will make sure to not loose the error message.
See commit 329e6e8(19 Sep 2015) by Nguy?n Thái Ng?c Duy (pclouds).
(Merged by Junio C Hamano -- gitster--in commit 076c827, 15 Oct 2015)

注意:只有 git 2.7 (Q4 2015) 才能确保不会丢失错误消息
请参阅Nguy?n Thái Ng?c Duy ( ) 的提交 329e6e8(2015 年 9 月 19 日(由Junio C Hamano合并-- --commit 076c827,2015 年 10 月 15 日)pclouds
gitster

gc: save log from daemonized gc --autoand print it next time

While commit 9f673f9(gc: config option for running --autoin background - 2014-02-08) helps reduce some complaints about 'gc --auto' hogging the terminal, it creates another set of problems.

The latest in this set is, as the result of daemonizing, stderris closed and all warnings are lost. This warning at the end of cmd_gc()is particularly important because it tells the user how to avoid "gc --auto" running repeatedly.
Because stderr is closed, the user does not know, naturally they complain about 'gc --auto' wasting CPU.

Daemonized gcnow saves stderrto $GIT_DIR/gc.log.
Following gc --autowill not run and gc.logprinted out until the user removes gc.log
.

gc: 从守护进程中保存日志gc --auto并在下次打印

虽然提交 9f673f9gc用于--auto在后台运行的配置选项- 2014-02-08)有助于减少关于“ gc --auto”占用终端的一些抱怨,但它会产生另一组问题。

这个集合中的最新一个是,作为守护进程的结果,stderr被关闭并且所有警告都丢失了。末尾的这个警告cmd_gc()特别重要,因为它告诉用户如何避免“ gc --auto”重复运行。
因为stderr是关闭的,用户不知道,自然会抱怨' gc --auto'浪费CPU。

Daemonizedgc现在保存stderr$GIT_DIR/gc.log.
以下gc --auto将不会运行,并gc.log打印出来,直到用户删除gc.log

回答by svick

Some operations run git gc --autoautomatically, so there should never be the needto run git gc, git should take care of this by itself.

一些操作会git gc --auto自动运行,所以永远不需要运行git gc,git 应该自己处理这个。

Contrary to what bwawok said, there actually is (or might be) a difference between your local repo and that bare one: What operations you do with it. For example dangling objects can be created by rebasing, but it may be possible that you never rebase the bare repo, so maybe you don't ever need to remove them (because there are never any). And thus you may not need to use git gcthat often. But then again, like I said, git should take care of this automatically.

与 bwawok 所说的相反,您的本地存储库和裸存储库之间实际上(或可能)存在差异:您用它做什么操作。例如,悬空对象可以通过变基创建,但您可能永远不会变基裸存储库,所以也许您永远不需要删除它们(因为从来没有)。因此,您可能不需要git gc经常使用它。但话又说回来,就像我说的,git 应该自动处理这个问题。

回答by bwawok

I do not know 100% about the logic of gc.. but to reason this out:

我对 gc 的逻辑不是 100% 了解,但要推理一下:

git gc removed extra history junk, compresses extra history, etc. It does nothing with your local copies of files.

git gc 删除了额外的历史垃圾,压缩了额外的历史等等。它对你的本地文件副本没有任何作用。

The only difference between a bare and normal repo is if you have local copies of files.

裸仓库和普通仓库之间的唯一区别是您是否拥有文件的本地副本。

So, I think it stands to reason that YES, you should run git gc on a bare repo.

所以,我认为是的,你应该在裸仓库上运行 git gc 是有道理的。

I have never personally ran it, but my repo is pretty small and is still fast.

我从来没有亲自运行过它,但我的 repo 很小而且速度仍然很快。