修复因大二进制文件而变慢的 git repo

Question

提问by anr78

We have a git repo containing both source code and binaries. The bare repo has now reached ~9GB, and cloning it takes ages. Most of the time is spent in "remote: Compressing objects". After a commit with a new version of one of the bigger binaries, a fetch takes a long time, also spent compressing objects on the server.

我们有一个包含源代码和二进制文件的 git repo。裸仓库现在已经达到了大约 9GB，克隆它需要很长时间。大部分时间都花在“远程：压缩对象”上。在使用较大二进制文件之一的新版本提交之后，获取需要很长时间，而且还需要在服务器上压缩对象。

After reading git pull without remotely compressing objectsI suspect delta compression of binary files is what hurts us as well, but I'm not 100% sure how to go about fixing this.

在阅读了没有远程压缩对象的 git pull之后，我怀疑二进制文件的增量压缩也会伤害我们，但我不是 100% 确定如何解决这个问题。

What are the exact steps to fix the bare repo on the server? My guess:

修复服务器上的裸仓库的确切步骤是什么？我猜：

Add entries like '*.zip -delta' for all extensions I want to into .git/info/attributes
Run 'git repack', but with what options? Would -adF repack everything, and leave me with a repo where no delta compression has ever been done on the specified file types?
Run 'git prune'. I thought this was done automatically, but running it when I played around with a bare clone of said repo decreased the size by ~2GB
Clone the repo, add and commit a .gitattributes with the same entries as I added in .git/info/attributes on the bare repo

为我想要的所有扩展添加像 '*.zip -delta' 这样的条目到 .git/info/attributes
运行 'git repack'，但有什么选项？-adF 是否会重新打包所有内容，并为我留下一个从未对指定文件类型进行过增量压缩的存储库？
运行'git prune'。我认为这是自动完成的，但是当我使用上述 repo 的裸克隆运行它时，它的大小减少了 ~2GB
克隆存储库，添加并提交一个 .gitattributes，其条目与我在裸存储库上的 .git/info/attributes 中添加的条目相同

Am I on to something?

我有事吗？

Update:

更新：

Some interesting test results on this. Today I started a bare clone of the problematic repo. Our not-so-powerful-server with 4GB ram ran out of memory and started swapping. After 3 hours I gave up...

一些有趣的测试结果。今天，我开始对有问题的 repo 进行裸克隆。我们不那么强大的 4GB 内存服务器内存不足并开始交换。3小时后我放弃了...

Then I instead cloned a bare repo from my up-to-date working copy. Cloning that one between workstations took ~5 minutes. I then pushed it up to the server as a new repo. Cloning thatrepo took only 7 minutes.

然后我从我的最新工作副本中克隆了一个裸仓库。在工作站之间克隆那个需要大约 5 分钟。然后我将它作为新的存储库推送到服务器。克隆那个repo 只花了 7 分钟。

If I interpret this correctly, a better packed repo performs much better, even without disabling the delta-compression for binary files. I guess this means the steps above are indeed what I want to do in the short term, but in addition I need to find out how to limit the amount of memory git is allowed to use for packing/compression on the server so I can avoid the swapping.

如果我正确解释了这一点，即使没有禁用二进制文件的增量压缩，更好的打包存储库也会表现得更好。我想这意味着上述步骤确实是我在短期内想要做的，但除此之外，我还需要找出如何限制 git 允许用于在服务器上进行打包/压缩的内存量，以便我可以避免交换。

In case it matters: The server runs git 1.7.0.4 and the workstations run 1.7.9.5.

如果重要：服务器运行 git 1.7.0.4，工作站运行 1.7.9.5。

Update 2:

更新 2：

I did the following steps on my testrepo, and think I will chance to do them on the server (after a backup)

我在我的 testrepo 上做了以下步骤，并认为我有机会在服务器上做它们（备份后）

Limit memory usage when packing objects
git config pack.windowMemory 100m
git config pack.packSizeLimit 200m
Disable delta compression for some extensions
echo '*.tar.gz -delta' >> info/attributes
echo '*.tar.bz2 -delta' >> info/attributes
echo '*.bin -delta' >> info/attributes
echo '*.png -delta' >> info/attributes
Repack repository and collect garbage
git repack -a -d -F --window-memory 100m --max-pack-size 200m
git gc

打包对象时限制内存使用
git config pack.windowMemory 100m
git config pack.packSizeLimit 200m
禁用某些扩展的增量压缩
echo '*.tar.gz -delta' >> info/attributes
echo '*.tar.bz2 -delta' >> info/attributes
echo '*.bin -delta' >> info/attributes
echo '*.png -delta ' >> 信息/属性
重新打包存储库并收集垃圾
git repack -a -d -F --window-memory 100m --max-pack-size 200m
git gc

Update 3:

更新 3：

Some unexpected side effects after this operation: Issues after trying to repack a git repo for improved performance

此操作后的一些意外副作用：尝试重新打包 git repo 以提高性能后出现问题

Answer 1

回答by Amir Rubin

While your questions asks on how to make your current repo more efficient, I don't think that's feasible.

虽然您的问题是关于如何使您当前的回购更有效率，但我认为这是不可行的。

Follow the advice of the crowd:

听从群众的建议：

Move your big binaries out of your repo
Move your dev environment to a virtual machine image: https://www.virtualbox.org/
Use this Python script to clean your repo of those large binary blobs (I used it to on my repo and it worked great) https://gist.github.com/1433794

将你的大二进制文件移出你的仓库
将您的开发环境移动到虚拟机映像：https: //www.virtualbox.org/
使用此 Python 脚本清除那些大型二进制 blob 的存储库（我在我的存储库中使用它并且效果很好） https://gist.github.com/1433794

Answer 2

回答by xception

You should use a different mechanism for storing the big binaries, if they are generated from something you could just not store them, just the code which generates them, otherwise I suggest moving all of them to a single directory and managing that with rsync or svn depending on your needs.

你应该使用不同的机制来存储大二进制文件，如果它们是从你不能存储它们的东西中生成的，只是生成它们的代码，否则我建议将它们全部移动到一个目录并使用 rsync 或 svn 进行管理取决于您的需要。

修复因大二进制文件而变慢的 git repo

提问by anr78

回答by Amir Rubin

回答by xception

相关推荐

最近更新

标签

修复因大二进制文件而变慢的 git repo

提问by anr78

回答by Amir Rubin

回答by xception

相关推荐

git Github README.md 和 readme.md - 如何删除一个？

为什么 git stash 不能放弃刚才所做的更改？

丢失的 git stash 更改

git 如何管理一个项目的多个版本？

相关推荐

最近更新

标签