有没有办法限制“git gc”使用的内存量?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/3095737/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Is there a way to limit the amount of memory that "git gc" uses?
提问by sam2themax
I'm hosting a git repo on a shared host. My repo necessarily has a couple of very large files in it, and every time I try to run "git gc" on the repo now, my process gets killed by the shared hosting provider for using too much memory. Is there a way to limit the amount of memory that git gc can consume? My hope would be that it can trade memory usage for speed and just take a little longer to do its work.
我在共享主机上托管一个 git repo。我的 repo 中必然有几个非常大的文件,每次我尝试在 repo 上运行“git gc”时,我的进程都会因使用过多内存而被共享托管提供商杀死。有没有办法限制 git gc 可以消耗的内存量?我希望它可以用内存来换取速度,并且只需要花更长的时间来完成它的工作。
采纳答案by CB Bailey
Yes, have a look at the help page for git config
and look at the pack.*
options, specifically pack.depth
, pack.window
, pack.windowMemory
and pack.deltaCacheSize
.
是的,请查看帮助页面git config
并查看pack.*
选项,特别pack.depth
是pack.window
、pack.windowMemory
和pack.deltaCacheSize
。
It's not a totally exact size as git needs to map each object into memory so one very large object can cause a lot of memory usage regardless of the window and delta cache settings.
它不是一个完全准确的大小,因为 git 需要将每个对象映射到内存中,因此无论窗口和增量缓存设置如何,一个非常大的对象都会导致大量内存使用。
You may have better luck packing locally and transfering pack files to the remote side "manually", adding a .keep
files so that the remote git doesn't ever try to completely repack everything.
您可能会在本地打包并“手动”将打包文件传输到远程端,添加.keep
文件以便远程 git 永远不会尝试完全重新打包所有内容。
回答by hopeithelps
I used instructions from this link. Same idea as Charles Baileyssuggested.
我使用了此链接中的说明。与Charles Baileys建议的想法相同。
A copy of the commands is here:
命令的副本在这里:
git config --global pack.windowMemory "100m"
git config --global pack.packSizeLimit "100m"
git config --global pack.threads "1"
This worked for me on hostgator with shared hosting account.
这在具有共享主机帐户的主机上对我有用。
回答by Tobu
Git repack's memory use is: (pack.deltaCacheSize + pack.windowMemory) × pack.threads
. Respective defaults are 256MiB, unlimited, nproc.
Git repack 的内存使用是:(pack.deltaCacheSize + pack.windowMemory) × pack.threads
. 各自的默认值是 256MiB、unlimited、nproc。
The delta cache isn't useful: most of the time is spent computing deltas on a sliding window, the majority of which are discarded; caching the survivors so they can be reused once (when writing) won't improve the runtime. That cache also isn't shared between threads.
增量缓存没有用:大部分时间都花在计算滑动窗口上的增量上,其中大部分被丢弃;缓存幸存者以便它们可以重用一次(在编写时)不会改善运行时。该缓存也不在线程之间共享。
By default the window memory is limited through pack.window
(gc.aggressiveWindow
). Limiting packing that way is a bad idea, because the working set size and efficiency will vary widely. It's best to raise both to much higher values and rely on pack.windowMemory
to limit the window size.
默认情况下,窗口内存通过pack.window
( gc.aggressiveWindow
) 进行限制。以这种方式限制打包是一个坏主意,因为工作集的大小和效率会有很大差异。最好将两者都提高到更高的值并依靠pack.windowMemory
限制窗口大小。
Finally, threading has the disadvantage of splitting the working set. Lowering pack.threads
and increasing pack.windowMemory
so that the total stays the same should improve the run time.
最后,线程有分裂工作集的缺点。降低pack.threads
和增加pack.windowMemory
以使总数保持不变应该会改善运行时间。
repack has other useful tunables (pack.depth
, pack.compression
, the bitmap options), but they don't affect memory use.
repack 有其他有用的可调参数(pack.depth
, pack.compression
,位图选项),但它们不影响内存使用。
回答by Chris Johnsen
You could use turn off the delta attribute to disable delta compression for just the blobs of those pathnames:
您可以使用关闭 delta 属性来仅对这些路径名的 blob 禁用 delta 压缩:
In foo/.git/info/attributes
(or foo.git/info/attributes
if it is a bare repository) (see the delta entry in gitattributesand see gitignorefor the pattern syntax):
在foo/.git/info/attributes
(或者foo.git/info/attributes
如果它是一个裸仓库)(参见gitattributes 中的 delta 条目并查看gitignore以获取模式语法):
/large_file_dir/* -delta
*.psd -delta
/data/*.iso -delta
/some/big/file -delta
another/file/that/is/large -delta
This will not affect clones of the repository. To affect other repositories (i.e. clones), put the attributes in a .gitattributes
file instead of (or in addition to) the info/attributes
file.
这不会影响存储库的克隆。要影响其他存储库(即克隆),请将属性放在.gitattributes
文件中而不是(或除了)info/attributes
文件中。
回答by VonC
Git 2.18 (Q2 2018) will improve the gc memory consumption.
Before 2.18, "git pack-objects
" needs to allocate tons of "struct object_entry
" while doing its work: shrinking its size helps the performance
quite a bit.
This influences git gc
.
Git 2.18(2018 年第二季度)将改善 gc 内存消耗。
在 2.18 之前,“ git pack-objects
”在工作时需要分配大量的“ struct object_entry
”:缩小其大小对性能有很大帮助。
这影响git gc
。
See commit f6a5576, commit 3b13a5f, commit 0aca34e, commit ac77d0c, commit 27a7d06, commit 660b373, commit 0cb3c14, commit 898eba5, commit 43fa44f, commit 06af3bb, commit b5c0cbd, commit 0c6804a, commit fd9b1ba, commit 8d6ccce, commit 4c2db93(14 Apr 2018) by Nguy?n Thái Ng?c Duy (pclouds
).
(Merged by Junio C Hamano -- gitster
--in commit ad635e8, 23 May 2018)
见提交f6a5576,提交3b13a5f,提交0aca34e,提交ac77d0c,提交27a7d06,提交660b373,提交0cb3c14,提交898eba5,提交43fa44f,提交06af3bb,提交b5c0cbd,提交0c6804a,提交fd9b1ba,提交8d6ccce,提交4c2db93(2018年4月14日)作者:Nguy?n Thái Ng?c Duy ( pclouds
)。
(由Junio C gitster
Hamano合并-- --在提交 ad635e8, 2018 年 5 月 23 日)
pack-objects
: reorder members to shrinkstruct object_entry
Previous patches leave lots of holes and padding in this struct.
This patch reorders the members and shrinks the struct down to 80 bytes (from 136 bytes on 64-bit systems, before any field shrinking is done) with 16 bits to spare (and a couple more in in_pack_header_size when we really run out of bits).This is the last in a series of memory reduction patches (see "pack-objects: a bit of document about struct object_entry" for the first one).
Overall they've reduced repack memory size on
linux-2.6.git
from 3.747G to 3.424G, or by around 320M, a decrease of 8.5%.
The runtime of repack has stayed the same throughout this series.
?var's testing on a big monorepo he has access to (bigger thanlinux-2.6.git
) has shown a 7.9% reduction, so the overall expected improvement should be somewhere around 8%.
pack-objects
: 重新排序成员以缩小struct object_entry
以前的补丁在这个结构中留下了很多洞和填充。
这个补丁对成员重新排序并将结构缩小到 80 字节(从 64 位系统上的 136 字节,在任何字段缩小完成之前),还有 16 位要备用(当我们真的用完位时,在 in_pack_header_size 中还有几个) .这是一系列内存减少补丁中的最后一个(第一个请参阅“ pack-objects:关于 struct object_entry 的一些文档”)。
总的来说,他们将重新打包的内存大小
linux-2.6.git
从 3.747G 减少到 3.424G,或者减少了大约 320M,减少了 8.5%。
repack 的运行时间在本系列中保持不变。
?var 对他可以访问的大型 monorepo(大于linux-2.6.git
)的测试显示减少了 7.9%,因此总体预期改进应该在 8% 左右。
With Git 2.20 (Q4 2018), it will be easier to check an object that exists in one fork is not made into a delta against another object that does not appear in the same forked repository.
使用 Git 2.20(2018 年第 4 季度),可以更轻松地检查一个分叉中存在的对象是否与未出现在同一个分叉存储库中的另一个对象形成增量。
See commit fe0ac2f, commit 108f530, commit f64ba53(16 Aug 2018) by Christian Couder (chriscool
).
Helped-by: Jeff King (peff
), and Duy Nguyen (pclouds
).
See commit 9eb0986, commit 16d75fa, commit 28b8a73, commit c8d521f(16 Aug 2018) by Jeff King (peff
).
Helped-by: Jeff King (peff
), and Duy Nguyen (pclouds
).
(Merged by Junio C Hamano -- gitster
--in commit f3504ea, 17 Sep 2018)
请参阅Christian Couder ( ) 的commit fe0ac2f、commit 108f530、commit f64ba53(2018 年 8 月 16 日)。
帮助者:Jeff King ( )和Duy Nguyen ( )。
请参阅Jeff King ( ) 的commit 9eb0986、commit 16d75fa、commit 28b8a73、commit c8d521f(2018 年 8 月 16 日)。
帮助者:Jeff King ( )和Duy Nguyen ( )。(由Junio C Hamano合并-- --在提交 f3504eachriscool
peff
pclouds
peff
peff
pclouds
gitster
, 2018 年 9 月 17 日)
pack-objects
: move 'layer
' into 'struct packing_data
'This reduces the size of 'struct object_entry' from 88 bytes to 80 and therefore makes packing objects more efficient.
For example on a Linux repo with 12M objects,
git pack-objects --all
needs extra 96MB memory even if the layer feature is not used.
pack-objects
: 将 'layer
' 移到 'struct packing_data
'这将“struct object_entry”的大小从 88 字节减少到 80 字节,从而使打包对象更加高效。
例如,在具有 12M 对象的 Linux
git pack-objects --all
存储库上,即使不使用层功能也需要额外的 96MB 内存。
Note that Git 2.21 (Feb. 2019) fixes a small bug: "git pack-objects
" incorrectly used uninitialized mutex, which has been corrected.
请注意,Git 2.21(2019 年 2 月)修复了一个小错误:“ git pack-objects
” 错误地使用了未初始化的互斥锁,已更正。
See commit edb673c, commit 459307b(25 Jan 2019) by Patrick Hogg (``).
Helped-by: Junio C Hamano (gitster
).
(Merged by Junio C Hamano -- gitster
--in commit d243a32, 05 Feb 2019)
请参阅Patrick Hogg (``) 的commit edb673c和commit 459307b(2019 年 1 月 25 日)。
帮助者:Junio C Hamano ( gitster
)。
(由Junio C gitster
Hamano合并-- --在d243a32 提交中,2019 年 2 月 5 日)
pack-objects
: move read mutex topacking_data
structac77d0c("
pack-objects
: shrink size field in structobject_entry
", 2018-04-14) added an extra usage of read_lock/read_unlock in the newly introducedoe_get_size_slow
for thread safety in parallel calls totry_delta()
.
Unfortunatelyoe_get_size_slow
is also used in serial code, some of which is called before the first invocation ofll_find_deltas
.
As such the read mutex is not guaranteed to be initialized.Resolve this by moving the read mutex to
packing_data
and initializing it in prepare_packing_data which is initialized incmd_pack_objects
.
pack-objects
: 将读取互斥packing_data
体移动到结构体ac77d0c(“
pack-objects
:struct 中的收缩大小字段object_entry
”,2018 年 4 月 14 日)oe_get_size_slow
在并行调用try_delta()
.
不幸的oe_get_size_slow
是也用于串行代码,其中一些在第一次调用ll_find_deltas
.
因此,不能保证读取互斥量会被初始化。通过将读取的互斥锁移动到
packing_data
并在 prepare_packing_data 中初始化它来解决这个问题,它在cmd_pack_objects
.
Git 2.21 (Feb. 2019) still find another way to shrink the size of the pack with "git pack-objects
" learning another algorithm to compute the set of
objects to send, that trades the resulting packfile off to save
traversal cost to favor small pushes.
Git 2.21(2019 年 2 月)仍然找到了另一种缩小包大小的方法,通过“ git pack-objects
”学习另一种算法来计算要发送的对象集,将生成的包文件进行交换以节省遍历成本以支持小推。
pack-objects
: createpack.useSparse
settingThe '
--sparse
' flag in 'git pack-objects
' changes the algorithm used to enumerate objects to one that is faster for individual users pushing new objects that change only a small cone of the working directory.
The sparse algorithm is not recommended for a server, which likely sends new objects that appear across the entire working directory.Create a '
pack.useSparse
' setting that enables this new algorithm.
This allows 'git push
' to use this algorithm without passing a '--sparse
' flag all the way through four levels ofrun_command()
calls.If the '
--no-sparse
' flag is set, then this config setting is overridden.
pack-objects
: 创建pack.useSparse
设置“
--sparse
”中的“ ”标志git pack-objects
将用于枚举对象的算法更改为对于推送仅更改工作目录的一小部分的新对象的个人用户而言更快的算法。
不建议将稀疏算法用于服务器,它可能会发送出现在整个工作目录中的新对象。创建
pack.useSparse
启用此新算法的“ ”设置。
这允许 'git push
' 使用此算法而无需在--sparse
四个run_command()
调用级别中一直传递 ' ' 标志。如果设置了“
--no-sparse
”标志,则覆盖此配置设置。
The config pack documentationnow includes:
该配置包的文档现在包括:
pack.useSparse:
When true, Git will default to using the '
--sparse
' option in 'git pack-objects
' when the '--revs
' option is present.
This algorithm only walks trees that appear in paths that introduce new objects.This can have significant performance benefits when computing a pack to send a small change.
However, it is possible that extra objects are added to the pack-file if the included commits contain certain types of direct renames.
pack.useSparse:
当为 true 时,当存在 ' ' 选项时,Git 将默认使用 '
--sparse
' 中git pack-objects
的 '--revs
' 选项。
该算法仅遍历出现在引入新对象的路径中的树。在计算包以发送小的更改时,这可以具有显着的性能优势。
但是,如果包含的提交包含某些类型的直接重命名,则可能会将额外的对象添加到包文件中。
See "git push
is very slow for a huge repo" for a concrete illustration.
有关具体说明,请参阅“git push
对于一个巨大的回购来说非常慢”。
Note: as commented in Git 2.24, a setting like pack.useSparse
is still experimental.
注意:正如 Git 2.24 中所评论的,像这样的设置pack.useSparse
仍然是实验性的。
See commit aaf633c, commit c6cc4c5, commit ad0fb65, commit 31b1de6, commit b068d9a, commit 7211b9e(13 Aug 2019) by Derrick Stolee (derrickstolee
).
(Merged by Junio C Hamano -- gitster
--in commit f4f8dfe, 09 Sep 2019)
请参阅Derrick Stolee ( ) 的commit aaf633c、commit c6cc4c5、commit ad0fb65、commit 31b1de6、commit b068d9a、commit 7211b9e(2019 年 8 月 13 日)。(由Junio C Hamano合并-- --在提交 f4f8dfe 中,2019 年 9 月 9 日)derrickstolee
gitster
repo-settings
: createfeature.experimental
settingThe '
feature.experimental
' setting includes config options that are not committed to become defaults, but could use additional testing.Update the following config settings to take new defaults, and to use the
repo_settings
struct if not already using them:
- 'pack.useSparse=true'
- 'fetch.negotiationAlgorithm=skipping'
repo-settings
: 创建feature.experimental
设置'
feature.experimental
' 设置包括未承诺成为默认值但可以使用额外测试的配置选项。更新以下配置设置以采用新的默认值,并使用
repo_settings
结构(如果尚未使用它们):
- 'pack.useSparse=true'
- 'fetch.negotiationAlgorithm=跳过'
With Git 2.26 (Q1 2020), The way "git pack-objects
" reuses objects stored in existing pack to generate its result has been improved.
在 Git 2.26(2020 年第一季度)中,“ git pack-objects
”重用存储在现有包中的对象以生成其结果的方式得到了改进。
See commit d2ea031, commit 92fb0db, commit bb514de, commit ff48302, commit e704fc7, commit 2f4af77, commit 8ebf529, commit 59b2829, commit 40d18ff, commit 14fbd26(18 Dec 2019), and commit 56d9cbe, commit bab28d9(13 Sep 2019) by Jeff King (peff
).
(Merged by Junio C Hamano -- gitster
--in commit a14aebe, 14 Feb 2020)
请参阅提交 d2ea031,提交 92fb0db,提交 bb514de,提交 ff48302,提交 e704fc7,提交 2f4af77,提交 8ebf529,提交 59b2829,提交 40d18ff,提交 2009年 12 月29日,提交 2018年 12 月 18 日,2018 年 12 月 18 日,Jeff和 2018 年 12 月29日提交 2016年( peff
)。
(由Junio C gitster
Hamano合并-- --in commit a14aebe,2020 年 2 月 14 日)
pack-objects
: improve partial packfile reuseHelped-by: Jonathan Tan
Signed-off-by: Jeff King
Signed-off-by: Christian CouderThe old code to reuse deltas from an existing packfile just tried to dump a whole segment of the pack verbatim. That's faster than the traditional way of actually adding objects to the packing list, but it didn't kick in very often. This new code is really going for a middle ground: do someper-object work, but way less than we'd traditionally do.
The general strategy of the new code is to make a bitmap of objects from the packfile we'll include, and then iterate over it, writing out each object exactly as it is in our on-disk pack, but notadding it to our packlist (which costs memory, and increases the search space for deltas).
One complication is that if we're omitting some objects, we can't set a delta against a base that we're not sending. So we have to check each object in
try_partial_reuse()
to make sure we have its delta.About performance, in the worst case we might have interleaved objects that we are sending or not sending, and we'd have as many chunks as objects. But in practice we send big chunks.
For instance, packing torvalds/linux on GitHub servers now reused 6.5M objects, but only needed ~50k chunks.
pack-objects
:改进部分包文件重用帮助者:Jonathan Tan
签字人:Jeff King
签字人:Christian Couder重用现有包文件中的增量的旧代码只是试图逐字转储包的整个部分。这比实际将对象添加到装箱单的传统方式要快,但它并不经常使用。这个新代码实际上是为了中间立场:做一些针对每个对象的工作,但比我们传统上做的要少。
新代码的一般策略是从我们将包含的包文件中制作对象的位图,然后对其进行迭代,按照磁盘包中的原样写出每个对象,但不将其添加到我们的包列表中(这会消耗内存,并增加增量的搜索空间)。
一个复杂的问题是,如果我们省略了一些对象,我们就不能针对我们没有发送的基数设置增量。所以我们必须检查每个对象
try_partial_reuse()
以确保我们有它的增量。关于性能,在最坏的情况下,我们可能会交错发送或不发送的对象,并且我们将拥有与对象一样多的块。但在实践中,我们发送大块。
例如,在 GitHub 服务器上打包 torvalds/linux 现在重用了 650 万个对象,但只需要大约 50k 块。