修复因大二进制文件而变慢的 git repo

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/12483910/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-19 07:31:03  来源:igfitidea点击:

Fixing up a git repo that is slowed because of big binary files

git

提问by anr78

We have a git repo containing both source code and binaries. The bare repo has now reached ~9GB, and cloning it takes ages. Most of the time is spent in "remote: Compressing objects". After a commit with a new version of one of the bigger binaries, a fetch takes a long time, also spent compressing objects on the server.

我们有一个包含源代码和二进制文件的 git repo。裸仓库现在已经达到了大约 9GB,克隆它需要很长时间。大部分时间都花在“远程:压缩对象”上。在使用较大二进制文件之一的新版本提交之后,获取需要很长时间,而且还需要在服务器上压缩对象。

After reading git pull without remotely compressing objectsI suspect delta compression of binary files is what hurts us as well, but I'm not 100% sure how to go about fixing this.

在阅读了没有远程压缩对象的 git pull之后,我怀疑二进制文件的增量压缩也会伤害我们,但我不是 100% 确定如何解决这个问题。

What are the exact steps to fix the bare repo on the server? My guess:

修复服务器上的裸仓库的确切步骤是什么?我猜:

  • Add entries like '*.zip -delta' for all extensions I want to into .git/info/attributes
  • Run 'git repack', but with what options? Would -adF repack everything, and leave me with a repo where no delta compression has ever been done on the specified file types?
  • Run 'git prune'. I thought this was done automatically, but running it when I played around with a bare clone of said repo decreased the size by ~2GB
  • Clone the repo, add and commit a .gitattributes with the same entries as I added in .git/info/attributes on the bare repo
  • 为我想要的所有扩展添加像 '*.zip -delta' 这样的条目到 .git/info/attributes
  • 运行 'git repack',但有什么选项?-adF 是否会重新打包所有内容,并为我留下一个从未对指定文件类型进行过增量压缩的存储库?
  • 运行'git prune'。我认为这是自动完成的,但是当我使用上述 repo 的裸克隆运行它时,它的大小减少了 ~2GB
  • 克隆存储库,添加并提交一个 .gitattributes,其条目与我在裸存储库上的 .git/info/attributes 中添加的条目相同

Am I on to something?

我有事吗?

Update:

更新:

Some interesting test results on this. Today I started a bare clone of the problematic repo. Our not-so-powerful-server with 4GB ram ran out of memory and started swapping. After 3 hours I gave up...

一些有趣的测试结果。今天,我开始对有问题的 repo 进行裸克隆。我们不那么强大的 4GB 内存服务器内存不足并开始交换。3小时后我放弃了...

Then I instead cloned a bare repo from my up-to-date working copy. Cloning that one between workstations took ~5 minutes. I then pushed it up to the server as a new repo. Cloning thatrepo took only 7 minutes.

然后我从我的最新工作副本中克隆了一个裸仓库。在工作站之间克隆那个需要大约 5 分钟。然后我将它作为新的存储库推送到服务器。克隆那个repo 只花了 7 分钟。

If I interpret this correctly, a better packed repo performs much better, even without disabling the delta-compression for binary files. I guess this means the steps above are indeed what I want to do in the short term, but in addition I need to find out how to limit the amount of memory git is allowed to use for packing/compression on the server so I can avoid the swapping.

如果我正确解释了这一点,即使没有禁用二进制文件的增量压缩,更好的打包存储库也会表现得更好。我想这意味着上述步骤确实是我在短期内想要做的,但除此之外,我还需要找出如何限制 git 允许用于在服务器上进行打包/压缩的内存量,以便我可以避免交换。

In case it matters: The server runs git 1.7.0.4 and the workstations run 1.7.9.5.

如果重要:服务器运行 git 1.7.0.4,工作站运行 1.7.9.5。

Update 2:

更新 2:

I did the following steps on my testrepo, and think I will chance to do them on the server (after a backup)

我在我的 testrepo 上做了以下步骤,并认为我有机会在服务器上做它们(备份后)

  • Limit memory usage when packing objects

    git config pack.windowMemory 100m
    git config pack.packSizeLimit 200m

  • Disable delta compression for some extensions

    echo '*.tar.gz -delta' >> info/attributes
    echo '*.tar.bz2 -delta' >> info/attributes
    echo '*.bin -delta' >> info/attributes
    echo '*.png -delta' >> info/attributes

  • Repack repository and collect garbage

    git repack -a -d -F --window-memory 100m --max-pack-size 200m
    git gc

  • 打包对象时限制内存使用

    git config pack.windowMemory 100m
    git config pack.packSizeLimit 200m

  • 禁用某些扩展的增量压缩

    echo '*.tar.gz -delta' >> info/attributes
    echo '*.tar.bz2 -delta' >> info/attributes
    echo '*.bin -delta' >> info/attributes
    echo '*.png -delta ' >> 信息/属性

  • 重新打包存储库并收集垃圾

    git repack -a -d -F --window-memory 100m --max-pack-size 200m
    git gc

Update 3:

更新 3:

Some unexpected side effects after this operation: Issues after trying to repack a git repo for improved performance

此操作后的一些意外副作用:尝试重新打包 git repo 以提高性能后出现问题

回答by Amir Rubin

While your questions asks on how to make your current repo more efficient, I don't think that's feasible.

虽然您的问题是关于如何使您当前的回购更有效率,但我认为这是不可行的。

Follow the advice of the crowd:

听从群众的建议:

  1. Move your big binaries out of your repo
  2. Move your dev environment to a virtual machine image: https://www.virtualbox.org/
  3. Use this Python script to clean your repo of those large binary blobs (I used it to on my repo and it worked great) https://gist.github.com/1433794
  1. 将你的大二进制文件移出你的仓库
  2. 将您的开发环境移动到虚拟机映像:https: //www.virtualbox.org/
  3. 使用此 Python 脚本清除那些大型二进制 blob 的存储库(我在我的存储库中使用它并且效果很好) https://gist.github.com/1433794

回答by xception

You should use a different mechanism for storing the big binaries, if they are generated from something you could just not store them, just the code which generates them, otherwise I suggest moving all of them to a single directory and managing that with rsync or svn depending on your needs.

你应该使用不同的机制来存储大二进制文件,如果它们是从你不能存储它们的东西中生成的,只是生成它们的代码,否则我建议将它们全部移动到一个目录并使用 rsync 或 svn 进行管理取决于您的需要。