你应该多久使用一次 git-gc?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/55729/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How often should you use git-gc?
提问by Readonly
How often should you use git-gc?
你应该多久使用一次 git-gc?
The manual pagesimply says:
该手册只是说:
Users are encouraged to run this task on a regular basis within each repository to maintain good disk space utilization and good operating performance.
鼓励用户在每个存储库中定期运行此任务,以保持良好的磁盘空间利用率和良好的操作性能。
Are there some commands to get some object counts to find out whether it's time to gc?
是否有一些命令可以获取一些对象计数来确定是否是 gc 的时间?
采纳答案by Adam Davis
It depends mostly on how much the repository is used. With one user checking in once a day and a branch/merge/etc operation once a week you probably don't need to run it more than once a year.
这主要取决于存储库的使用量。一个用户每天签入一次,每周执行一次分支/合并/等操作,您可能不需要每年运行一次以上。
With several dozen developers working on several dozen projects each checking in 2-3 times a day, you might want to run it nightly.
有几十个开发人员在处理几十个项目,每个项目每天检查 2-3 次,您可能希望每晚运行它。
It won't hurt to run it more frequently than needed, though.
不过,比需要更频繁地运行它不会有什么坏处。
What I'd do is run it now, then a week from now take a measurement of disk utilization, run it again, and measure disk utilization again. If it drops 5% in size, then run it once a week. If it drops more, then run it more frequently. If it drops less, then run it less frequently.
我要做的是现在运行它,然后一周后测量磁盘利用率,再次运行它,然后再次测量磁盘利用率。如果它的大小下降了 5%,那么每周运行一次。如果它下降得更多,那么更频繁地运行它。如果下降较少,则减少运行频率。
回答by Aristotle Pagaltzis
Note that the downside of garbage-collecting your repository is that, well, the garbage gets collected. As we all know as computer users, files we consider garbage right now might turn out to be very valuable three days in the future. The fact that git keeps most of its debris around has saved my bacon several times – by browsing all the dangling commits, I have recovered much work that I had accidentally canned.
请注意,对存储库进行垃圾收集的缺点是,垃圾会被收集。作为计算机用户,我们都知道,我们现在认为是垃圾的文件在未来三天可能会变得非常有价值。git 保留了大部分碎片这一事实已经多次拯救了我的培根——通过浏览所有悬而未决的提交,我恢复了很多我不小心罐头的工作。
So don't be too much of a neat freak in your private clones. There's little need for it.
所以不要在你的私人克隆中表现得过于纯粹。没有什么必要。
OTOH, the value of data recoverability is questionable for repos used mainly as remotes, eg. the place all the devs push to and/or pulled from. There, it might be sensible to kick off a GC run and a repacking frequently.
OTOH,数据可恢复性的价值对于主要用作遥控器的存储库是有问题的,例如。所有开发人员推入和/或拉出的地方。在那里,频繁启动 GC 运行和重新打包可能是明智的。
回答by mrowe
Recent versions of git run gc automatically when required, so you shouldn't have to do anything. See the Options section of man git-gc(1): "Some git commands run git gc --auto after performing operations that could create many loose objects."
最新版本的 git 会在需要时自动运行 gc,因此您无需执行任何操作。请参阅man git-gc(1)的选项部分:“某些 git 命令在执行可能创建许多松散对象的操作后运行 git gc --auto。”
回答by cregox
If you're using Git-Gui, it tells youwhen you should worry:
This repository currently has approximately 1500 loose objects.
This repository currently has approximately 1500 loose objects.
The following command will bring a similar number:
以下命令将带来类似的数字:
$ git count-objects
Except, from its source, git-gui will do the math by itself, actually counting something at .git/objects
folder and probably brings an approximation (I don't know tcl
to properly read that!).
除了,从它的来源来看,git-gui 会自己做数学运算,实际上是在.git/objects
文件夹中计算一些东西,并且可能会带来一个近似值(我不知道tcl
正确阅读它!)。
In any case, it seemsto give the warning based on an arbitrary number around300 loose objects.
无论如何,它似乎根据大约300 个松散物体的任意数量发出警告。
回答by VonC
You can do it without any interruption, with the new (Git 2.0 Q2 2014) setting gc.autodetach
.
使用新的 (Git 2.0 Q2 2014) 设置,您可以不受任何干扰地进行操作gc.autodetach
。
See commit 4c4ac4dand commit 9f673f9(Nguy?n Thái Ng?c Duy, aka pclouds):
请参阅提交 4c4ac4d和提交 9f673f9(Nguy?n Thái Ng?c Duy,又名 pclouds):
gc --auto
takes time and can block the user temporarily (but not any less annoyingly).
Make it run in background on systems that support it.
The only thing lost with running in background is printouts. Butgc output
is not really interesting.
You can keep it in foreground by changinggc.autodetach
.
gc --auto
需要时间并且可以暂时阻止用户(但同样令人讨厌)。
让它在支持它的系统上在后台运行。
在后台运行唯一丢失的是打印输出。但gc output
并不是很有趣。
您可以通过更改将其保持在前台gc.autodetach
。
Since that 2.0 release, there was a bug though: git 2.7 (Q4 2015) will make sure to not lose the error message.
See commit 329e6e8(19 Sep 2015) by Nguy?n Thái Ng?c Duy (pclouds
).
(Merged by Junio C Hamano -- gitster
--in commit 076c827, 15 Oct 2015)
从 2.0 版本开始,虽然有一个错误:git 2.7(2015 年第 4 季度)将确保不会丢失错误消息。
请参阅Nguy?n Thái Ng?c Duy ( ) 的commit 329e6e8(2015 年 9 月 19 日)。(由Junio C Hamano合并-- --在commit 076c827,2015 年 10 月 15 日)pclouds
gitster
gc
: save log from daemonizedgc --auto
and print it next timeWhile commit 9f673f9(
gc
: config option for running--auto
in background - 2014-02-08) helps reduce some complaints about 'gc --auto
' hogging the terminal, it creates another set of problems.The latest in this set is, as the result of daemonizing,
stderr
is closed and all warnings are lost. This warning at the end ofcmd_gc()
is particularly important because it tells the user how to avoid "gc --auto
" running repeatedly.
Because stderr is closed, the user does not know, naturally they complain about 'gc --auto
' wasting CPU.Daemonized
gc
now savesstderr
to$GIT_DIR/gc.log
.
Followinggc --auto
will not run andgc.log
printed out until the user removesgc.log
.
gc
: 从守护进程中保存日志gc --auto
并在下次打印虽然提交 9f673f9(
gc
用于--auto
在后台运行的配置选项- 2014-02-08)有助于减少关于“gc --auto
”占用终端的一些抱怨,但它会产生另一组问题。这个集合中的最新一个是,作为守护进程的结果,
stderr
被关闭并且所有警告都丢失了。末尾的这个警告cmd_gc()
特别重要,因为它告诉用户如何避免“gc --auto
”重复运行。
因为stderr是关闭的,用户不知道,自然会抱怨'gc --auto
'浪费CPU。Daemonized
gc
现在保存stderr
到$GIT_DIR/gc.log
.
在用户删除之前,以下gc --auto
不会运行和gc.log
打印出来gc.log
。
回答by Pat Notz
Drop it in a cron job that runs every night (afternoon?) when you're sleeping.
当你睡觉的时候,把它放到一个每天晚上(下午?)运行的 cron 工作中。
回答by Rory
I use git gc after I do a big checkout, and have a lot of new object. it can save space. E.g. if you checkout a big SVN project using git-svn, and do a git gc, you typically save a lot of space
我在进行大检查后使用 git gc ,并且有很多新对象。它可以节省空间。例如,如果您使用 git-svn 签出一个大型 SVN 项目,并执行 git gc,您通常会节省大量空间
回答by Teoman shipahi
This quote is taken from; Version Control with Git
此引文摘自; 使用 Git 进行版本控制
Git runs garbage collection automatically:
? If there are too many loose objects in the repository
? When a push to a remote repository happens
? After some commands that might introduce many loose objects
? When some commands such as git reflog expire explicitly request it
And finally, garbage collection occurs when you explicitly request it using the git gc command. But when should that be? There's no solid answer to this question, but there is some good advice and best practice.
You should consider running git gc manually in a few situations:
? If you have just completed a git filter-branch . Recall that filter-branch rewrites many commits, introduces new ones, and leaves the old ones on a ref that should be removed when you are satisfied with the results. All those dead objects (that are no longer referenced since you just removed the one ref pointing to them) should be removed via garbage collection.
? After some commands that might introduce many loose objects. This might be a large rebase effort, for example.
And on the flip side, when should you be wary of garbage collection?
? If there are orphaned refs that you might want to recover
? In the context of git rerere and you do not need to save the resolutions forever
? In the context of only tags and branches being sufficient to cause Git to retain a commit permanently
? In the context of FETCH_HEAD retrievals (URL-direct retrievals via git fetch ) because they are immediately subject to garbage collection
Git 自动运行垃圾收集:
? 如果存储库中有太多松散对象
? 当推送到远程存储库时
? 在一些可能会引入许多松散对象的命令之后
? 当某些命令如 git reflog expire 显式请求它时
最后,当您使用 git gc 命令显式请求垃圾收集时,就会发生垃圾收集。但应该是什么时候?这个问题没有可靠的答案,但有一些很好的建议和最佳实践。
在以下几种情况下,您应该考虑手动运行 git gc:
? 如果您刚刚完成了 git filter-branch 。回想一下,filter-branch 重写了许多提交,引入了新提交,并将旧提交留在了一个 ref 上,当您对结果感到满意时应该将其删除。所有那些死对象(由于您刚刚删除了指向它们的一个引用而不再被引用)应该通过垃圾收集来删除。
? 在一些可能会引入许多松散对象的命令之后。例如,这可能需要大量的 rebase 工作。
另一方面,什么时候应该警惕垃圾收集?
? 如果有您可能想要恢复的孤立引用
? 在 git rerere 的上下文中,您不需要永远保存分辨率
? 在仅标记和分支足以使 Git 永久保留提交的上下文中
? 在 FETCH_HEAD 检索(通过 git fetch 的 URL 直接检索)的上下文中,因为它们会立即进行垃圾收集
回答by ghiboz
I use when I do a big commit, above all when I remove more files from the repository.. after, the commits are faster
我在进行大提交时使用,尤其是当我从存储库中删除更多文件时......之后,提交速度更快
回答by Immi
You don't have to use git gc
very often, because git gc
(Garbage collection) is run automatically on several frequently used commands:
您不必git gc
经常使用,因为git gc
(Garbage collection) 在几个常用命令上自动运行:
git pull
git merge
git rebase
git commit
Source: git gc best practices and FAQS