有没有办法限制“git gc”使用的内存量?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3095737/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-19 04:21:46  来源:igfitidea点击:

Is there a way to limit the amount of memory that "git gc" uses?

gitmemorydreamhostgit-gc

提问by sam2themax

I'm hosting a git repo on a shared host. My repo necessarily has a couple of very large files in it, and every time I try to run "git gc" on the repo now, my process gets killed by the shared hosting provider for using too much memory. Is there a way to limit the amount of memory that git gc can consume? My hope would be that it can trade memory usage for speed and just take a little longer to do its work.

我在共享主机上托管一个 git repo。我的 repo 中必然有几个非常大的文件,每次我尝试在 repo 上运行“git gc”时,我的进程都会因使用过多内存而被共享托管提供商杀死。有没有办法限制 git gc 可以消耗的内存量?我希望它可以用内存来换取速度,并且只需要花更长的时间来完成它的工作。

采纳答案by CB Bailey

Yes, have a look at the help page for git configand look at the pack.*options, specifically pack.depth, pack.window, pack.windowMemoryand pack.deltaCacheSize.

是的,请查看帮助页面git config并查看pack.*选项,特别pack.depthpack.windowpack.windowMemorypack.deltaCacheSize

It's not a totally exact size as git needs to map each object into memory so one very large object can cause a lot of memory usage regardless of the window and delta cache settings.

它不是一个完全准确的大小,因为 git 需要将每个对象映射到内存中,因此无论窗口和增量缓存设置如何,一个非常大的对象都会导致大量内存使用。

You may have better luck packing locally and transfering pack files to the remote side "manually", adding a .keepfiles so that the remote git doesn't ever try to completely repack everything.

您可能会在本地打包并“手动”将打包文件传输到远程端,添加.keep文件以便远程 git 永远不会尝试完全重新打包所有内容。

回答by hopeithelps

I used instructions from this link. Same idea as Charles Baileyssuggested.

我使用了此链接中的说明。与Charles Baileys建议的想法相同。

A copy of the commands is here:

命令的副本在这里:

git config --global pack.windowMemory "100m"
git config --global pack.packSizeLimit "100m"
git config --global pack.threads "1"

This worked for me on hostgator with shared hosting account.

这在具有共享主机帐户的主机上对我有用。

回答by Tobu

Git repack's memory use is: (pack.deltaCacheSize + pack.windowMemory) × pack.threads. Respective defaults are 256MiB, unlimited, nproc.

Git repack 的内存使用是:(pack.deltaCacheSize + pack.windowMemory) × pack.threads. 各自的默认值是 256MiB、unlimited、nproc。

The delta cache isn't useful: most of the time is spent computing deltas on a sliding window, the majority of which are discarded; caching the survivors so they can be reused once (when writing) won't improve the runtime. That cache also isn't shared between threads.

增量缓存没有用:大部分时间都花在计算滑动窗口上的增量上,其中大部分被丢弃;缓存幸存者以便它们可以重用一次(在编写时)不会改善运行时。该缓存也不在线程之间共享。

By default the window memory is limited through pack.window(gc.aggressiveWindow). Limiting packing that way is a bad idea, because the working set size and efficiency will vary widely. It's best to raise both to much higher values and rely on pack.windowMemoryto limit the window size.

默认情况下,窗口内存通过pack.window( gc.aggressiveWindow) 进行限制。以这种方式限制打包是一个坏主意,因为工作集的大小和效率会有很大差异。最好将两者都提高到更高的值并依靠pack.windowMemory限制窗口大小。

Finally, threading has the disadvantage of splitting the working set. Lowering pack.threadsand increasing pack.windowMemoryso that the total stays the same should improve the run time.

最后,线程有分裂工作集的缺点。降低pack.threads和增加pack.windowMemory以使总数保持不变应该会改善运行时间。

repack has other useful tunables (pack.depth, pack.compression, the bitmap options), but they don't affect memory use.

repack 有其他有用的可调参数(pack.depth, pack.compression,位图选项),但它们不影响内存使用。

回答by Chris Johnsen

You could use turn off the delta attribute to disable delta compression for just the blobs of those pathnames:

您可以使用关闭 delta 属性来仅对这些路径名的 blob 禁用 delta 压缩:

In foo/.git/info/attributes(or foo.git/info/attributesif it is a bare repository) (see the delta entry in gitattributesand see gitignorefor the pattern syntax):

foo/.git/info/attributes(或者foo.git/info/attributes如果它是一个裸仓库)(参见gitattributes 中的 delta 条目并查看gitignore以获取模式语法):

/large_file_dir/* -delta
*.psd -delta
/data/*.iso -delta
/some/big/file -delta
another/file/that/is/large -delta

This will not affect clones of the repository. To affect other repositories (i.e. clones), put the attributes in a .gitattributesfile instead of (or in addition to) the info/attributesfile.

这不会影响存储库的克隆。要影响其他存储库(即克隆),请将属性放在.gitattributes文件中而不是(或除了)info/attributes文件中。

回答by VonC

Git 2.18 (Q2 2018) will improve the gc memory consumption.
Before 2.18, "git pack-objects" needs to allocate tons of "struct object_entry" while doing its work: shrinking its size helps the performance quite a bit.
This influences git gc.

Git 2.18(2018 年第二季度)将改善 gc 内存消耗。
在 2.18 之前,“ git pack-objects”在工作时需要分配大量的“ struct object_entry”:缩小其大小对性能有很大帮助
这影响git gc

See commit f6a5576, commit 3b13a5f, commit 0aca34e, commit ac77d0c, commit 27a7d06, commit 660b373, commit 0cb3c14, commit 898eba5, commit 43fa44f, commit 06af3bb, commit b5c0cbd, commit 0c6804a, commit fd9b1ba, commit 8d6ccce, commit 4c2db93(14 Apr 2018) by Nguy?n Thái Ng?c Duy (pclouds).
(Merged by Junio C Hamano -- gitster--in commit ad635e8, 23 May 2018)

提交f6a5576提交3b13a5f提交0aca34e提交ac77d0c提交27a7d06提交660b373提交0cb3c14提交898eba5提交43fa44f提交06af3bb提交b5c0cbd提交0c6804a提交fd9b1ba提交8d6ccce提交4c2db93(2018年4月14日)作者:Nguy?n Thái Ng?c Duy ( pclouds)
(由Junio C gitsterHamano合并-- --提交 ad635e8, 2018 年 5 月 23 日)

pack-objects: reorder members to shrink struct object_entry

Previous patches leave lots of holes and padding in this struct.
This patch reorders the members and shrinks the struct down to 80 bytes (from 136 bytes on 64-bit systems, before any field shrinking is done) with 16 bits to spare (and a couple more in in_pack_header_size when we really run out of bits).

This is the last in a series of memory reduction patches (see "pack-objects: a bit of document about struct object_entry" for the first one).

Overall they've reduced repack memory size on linux-2.6.gitfrom 3.747G to 3.424G, or by around 320M, a decrease of 8.5%.
The runtime of repack has stayed the same throughout this series.
?var's testing on a big monorepo he has access to (bigger than linux-2.6.git) has shown a 7.9% reduction, so the overall expected improvement should be somewhere around 8%.

pack-objects: 重新排序成员以缩小 struct object_entry

以前的补丁在这个结构中留下了很多洞和填充。
这个补丁对成员重新排序并将结构缩小到 80 字节(从 64 位系统上的 136 字节,在任何字段缩小完成之前),还有 16 位要备用(当我们真的用完位时,在 in_pack_header_size 中还有几个) .

这是一系列内存减少补丁中的最后一个(第一个请参阅“ pack-objects:关于 struct object_entry 的一些文档”)。

总的来说,他们将重新打包的内存大小linux-2.6.git从 3.747G 减少到 3.424G,或者减少了大约 320M,减少了 8.5%。
repack 的运行时间在本系列中保持不变。
?var 对他可以访问的大型 monorepo(大于linux-2.6.git)的测试显示减少了 7.9%,因此总体预期改进应该在 8% 左右。



With Git 2.20 (Q4 2018), it will be easier to check an object that exists in one fork is not made into a delta against another object that does not appear in the same forked repository.

使用 Git 2.20(2018 年第 4 季度),可以更轻松地检查一个分叉中存在的对象是否与未出现在同一个分叉存储库中的另一个对象形成增量。

See commit fe0ac2f, commit 108f530, commit f64ba53(16 Aug 2018) by Christian Couder (chriscool).
Helped-by: Jeff King (peff), and Duy Nguyen (pclouds).
See commit 9eb0986, commit 16d75fa, commit 28b8a73, commit c8d521f(16 Aug 2018) by Jeff King (peff).
Helped-by: Jeff King (peff), and Duy Nguyen (pclouds).
(Merged by Junio C Hamano -- gitster--in commit f3504ea, 17 Sep 2018)

请参阅Christian Couder ( ) 的commit fe0ac2fcommit 108f530commit f64ba53(2018 年 8 月 16 日。 帮助者:Jeff King ( )Duy Nguyen ( )。 请参阅Jeff King ( ) 的commit 9eb0986commit 16d75facommit 28b8a73commit c8d521f(2018 年 8 月 16 日。 帮助者:Jeff King ( )Duy Nguyen ( )(由Junio C Hamano合并-- --提交 f3504eachriscool
peffpclouds
peff
peffpclouds
gitster, 2018 年 9 月 17 日)

pack-objects: move 'layer' into 'struct packing_data'

This reduces the size of 'struct object_entry' from 88 bytes to 80 and therefore makes packing objects more efficient.

For example on a Linux repo with 12M objects, git pack-objects --allneeds extra 96MB memory even if the layer feature is not used.

pack-objects: 将 ' layer' 移到 ' struct packing_data'

这将“struct object_entry”的大小从 88 字节减少到 80 字节,从而使打包对象更加高效。

例如,在具有 12M 对象的 Linuxgit pack-objects --all存储库上,即使不使用层功能也需要额外的 96MB 内存。



Note that Git 2.21 (Feb. 2019) fixes a small bug: "git pack-objects" incorrectly used uninitialized mutex, which has been corrected.

请注意,Git 2.21(2019 年 2 月)修复了一个小错误:“ git pack-objects” 错误地使用了未初始化的互斥锁,已更正。

See commit edb673c, commit 459307b(25 Jan 2019) by Patrick Hogg (``).
Helped-by: Junio C Hamano (gitster).
(Merged by Junio C Hamano -- gitster--in commit d243a32, 05 Feb 2019)

请参阅Patrick Hogg (``) 的commit edb673ccommit 459307b(2019 年 1 月 25 日
帮助者:Junio C Hamano ( gitster)
(由Junio C gitsterHamano合并-- --d243a32 提交中,2019 年 2 月 5 日)

pack-objects: move read mutex to packing_datastruct

ac77d0c("pack-objects: shrink size field in struct object_entry", 2018-04-14) added an extra usage of read_lock/read_unlock in the newly introduced oe_get_size_slowfor thread safety in parallel calls to try_delta().
Unfortunately oe_get_size_slowis also used in serial code, some of which is called before the first invocation of ll_find_deltas.
As such the read mutex is not guaranteed to be initialized.

Resolve this by moving the read mutex to packing_dataand initializing it in prepare_packing_data which is initialized in cmd_pack_objects.

pack-objects: 将读取互斥packing_data体移动到结构体

ac77d0c(“ pack-objects:struct 中的收缩大小字段object_entry”,2018 年 4 月 14 日)oe_get_size_slow在并行调用 try_delta().
不幸的oe_get_size_slow是也用于串行代码,其中一些在第一次调用 ll_find_deltas.
因此,不能保证读取互斥量会被初始化。

通过将读取的互斥锁移动到packing_data并在 prepare_packing_data 中初始化它来解决这个问题,它在cmd_pack_objects.



Git 2.21 (Feb. 2019) still find another way to shrink the size of the pack with "git pack-objects" learning another algorithm to compute the set of objects to send, that trades the resulting packfile off to save traversal cost to favor small pushes.

Git 2.21(2019 年 2 月)仍然找到了另一种缩小包大小的方法,通过“ git pack-objects”学习另一种算法来计算要发送的对象集,将生成的包文件进行交换以节省遍历成本以支持小推。

pack-objects: create pack.useSparsesetting

The '--sparse' flag in 'git pack-objects' changes the algorithm used to enumerate objects to one that is faster for individual users pushing new objects that change only a small cone of the working directory.
The sparse algorithm is not recommended for a server, which likely sends new objects that appear across the entire working directory.

Create a 'pack.useSparse' setting that enables this new algorithm.
This allows 'git push' to use this algorithm without passing a '--sparse' flag all the way through four levels of run_command()calls.

If the '--no-sparse' flag is set, then this config setting is overridden.

pack-objects: 创建pack.useSparse设置

--sparse”中的“ ”标志git pack-objects将用于枚举对象的算法更改为对于推送仅更改工作目录的一小部分的新对象的个人用户而言更快的算法。
不建议将稀疏算法用于服务器,它可能会发送出现在整个工作目录中的新对象。

创建pack.useSparse启用此新算法的“ ”设置。
这允许 ' git push' 使用此算法而无需在--sparse四个run_command()调用级别中一直传递 ' ' 标志。

如果设置了“ --no-sparse”标志,则覆盖此配置设置。

The config pack documentationnow includes:

配置包的文档现在包括:

pack.useSparse:

When true, Git will default to using the '--sparse' option in 'git pack-objects' when the '--revs' option is present.
This algorithm only walks trees that appear in paths that introduce new objects.

This can have significant performance benefits when computing a pack to send a small change.

However, it is possible that extra objects are added to the pack-file if the included commits contain certain types of direct renames.

pack.useSparse:

当为 true 时,当存在 ' ' 选项时,Git 将默认使用 ' --sparse' 中git pack-objects的 ' --revs' 选项。
该算法仅遍历出现在引入新对象的路径中的树。

在计算包以发送小的更改时,这可以具有显着的性能优势。

但是,如果包含的提交包含某些类型的直接重命名,则可能会将额外的对象添加到包文件中。

See "git pushis very slow for a huge repo" for a concrete illustration.

有关具体说明,请参阅“git push对于一个巨大的回购来说非常慢”。



Note: as commented in Git 2.24, a setting like pack.useSparseis still experimental.

注意:正如 Git 2.24 中所评论的,像这样的设置pack.useSparse仍然是实验性的。

See commit aaf633c, commit c6cc4c5, commit ad0fb65, commit 31b1de6, commit b068d9a, commit 7211b9e(13 Aug 2019) by Derrick Stolee (derrickstolee).
(Merged by Junio C Hamano -- gitster--in commit f4f8dfe, 09 Sep 2019)

请参阅Derrick Stolee ( ) 的commit aaf633ccommit c6cc4c5commit ad0fb65commit 31b1de6commit b068d9acommit 7211b9e(2019 年 8 月 13 日(由Junio C Hamano合并-- --提交 f4f8dfe 中,2019 年 9 月 9 日)derrickstolee
gitster

repo-settings: create feature.experimentalsetting

The 'feature.experimental' setting includes config options that are not committed to become defaults, but could use additional testing.

Update the following config settings to take new defaults, and to use the repo_settingsstruct if not already using them:

  • 'pack.useSparse=true'
  • 'fetch.negotiationAlgorithm=skipping'

repo-settings: 创建feature.experimental设置

' feature.experimental' 设置包括未承诺成为默认值但可以使用额外测试的配置选项

更新以下配置设置以采用新的默认值,并使用repo_settings结构(如果尚未使用它们):

  • 'pack.useSparse=true'
  • 'fetch.negotiationAlgorithm=跳过'


With Git 2.26 (Q1 2020), The way "git pack-objects" reuses objects stored in existing pack to generate its result has been improved.

在 Git 2.26(2020 年第一季度)中,“ git pack-objects”重用存储在现有包中的对象以生成其结果的方式得到了改进。

See commit d2ea031, commit 92fb0db, commit bb514de, commit ff48302, commit e704fc7, commit 2f4af77, commit 8ebf529, commit 59b2829, commit 40d18ff, commit 14fbd26(18 Dec 2019), and commit 56d9cbe, commit bab28d9(13 Sep 2019) by Jeff King (peff).
(Merged by Junio C Hamano -- gitster--in commit a14aebe, 14 Feb 2020)

请参阅提交 d2ea031提交 92fb0db提交 bb514de提交 ff48302提交 e704fc7提交 2f4af77提交 8ebf529提交 59b2829提交 40d18ff提交 2009年 12 月29日,提交 2018年 12 月 18 日,2018 年 12 月 18 日,Jeff和 2018 年 12 月29提交 2016( peff)
(由Junio C gitsterHamano合并-- --in commit a14aebe,2020 年 2 月 14 日)

pack-objects: improve partial packfile reuse

Helped-by: Jonathan Tan
Signed-off-by: Jeff King
Signed-off-by: Christian Couder

The old code to reuse deltas from an existing packfile just tried to dump a whole segment of the pack verbatim. That's faster than the traditional way of actually adding objects to the packing list, but it didn't kick in very often. This new code is really going for a middle ground: do someper-object work, but way less than we'd traditionally do.

The general strategy of the new code is to make a bitmap of objects from the packfile we'll include, and then iterate over it, writing out each object exactly as it is in our on-disk pack, but notadding it to our packlist (which costs memory, and increases the search space for deltas).

One complication is that if we're omitting some objects, we can't set a delta against a base that we're not sending. So we have to check each object in try_partial_reuse()to make sure we have its delta.

About performance, in the worst case we might have interleaved objects that we are sending or not sending, and we'd have as many chunks as objects. But in practice we send big chunks.

For instance, packing torvalds/linux on GitHub servers now reused 6.5M objects, but only needed ~50k chunks.

pack-objects:改进部分包文件重用

帮助者:Jonathan Tan
签字人:Jeff King
签字人:Christian Couder

重用现有包文件中的增量的旧代码只是试图逐字转储包的整个部分。这比实际将对象添加到装箱单的传统方式要快,但它并不经常使用。这个新代码实际上是为了中间立场:做一些针对每个对象的工作,但比我们传统上做的要少。

新代码的一般策略是从我们将包含的包文件中制作对象的位图,然后对其进行迭代,按照磁盘包中的原样写出每个对象,但将其添加到我们的包列表中(这会消耗内存,并增加增量的搜索空间)。

一个复杂的问题是,如果我们省略了一些对象,我们就不能针对我们没有发送的基数设置增量。所以我们必须检查每个对象try_partial_reuse()以确保我们有它的增量。

关于性能,在最坏的情况下,我们可能会交错发送或不发送的对象,并且我们将拥有与对象一样多的块。但在实践中,我们发送大块。

例如,在 GitHub 服务器上打包 torvalds/linux 现在重用了 650 万个对象,但只需要大约 50k 块。