修复因大二进制文件而变慢的 git repo
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/12483910/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Fixing up a git repo that is slowed because of big binary files
提问by anr78
We have a git repo containing both source code and binaries. The bare repo has now reached ~9GB, and cloning it takes ages. Most of the time is spent in "remote: Compressing objects". After a commit with a new version of one of the bigger binaries, a fetch takes a long time, also spent compressing objects on the server.
我们有一个包含源代码和二进制文件的 git repo。裸仓库现在已经达到了大约 9GB,克隆它需要很长时间。大部分时间都花在“远程:压缩对象”上。在使用较大二进制文件之一的新版本提交之后,获取需要很长时间,而且还需要在服务器上压缩对象。
After reading git pull without remotely compressing objectsI suspect delta compression of binary files is what hurts us as well, but I'm not 100% sure how to go about fixing this.
在阅读了没有远程压缩对象的 git pull之后,我怀疑二进制文件的增量压缩也会伤害我们,但我不是 100% 确定如何解决这个问题。
What are the exact steps to fix the bare repo on the server? My guess:
修复服务器上的裸仓库的确切步骤是什么?我猜:
- Add entries like '*.zip -delta' for all extensions I want to into .git/info/attributes
- Run 'git repack', but with what options? Would -adF repack everything, and leave me with a repo where no delta compression has ever been done on the specified file types?
- Run 'git prune'. I thought this was done automatically, but running it when I played around with a bare clone of said repo decreased the size by ~2GB
- Clone the repo, add and commit a .gitattributes with the same entries as I added in .git/info/attributes on the bare repo
- 为我想要的所有扩展添加像 '*.zip -delta' 这样的条目到 .git/info/attributes
- 运行 'git repack',但有什么选项?-adF 是否会重新打包所有内容,并为我留下一个从未对指定文件类型进行过增量压缩的存储库?
- 运行'git prune'。我认为这是自动完成的,但是当我使用上述 repo 的裸克隆运行它时,它的大小减少了 ~2GB
- 克隆存储库,添加并提交一个 .gitattributes,其条目与我在裸存储库上的 .git/info/attributes 中添加的条目相同
Am I on to something?
我有事吗?
Update:
更新:
Some interesting test results on this. Today I started a bare clone of the problematic repo. Our not-so-powerful-server with 4GB ram ran out of memory and started swapping. After 3 hours I gave up...
一些有趣的测试结果。今天,我开始对有问题的 repo 进行裸克隆。我们不那么强大的 4GB 内存服务器内存不足并开始交换。3小时后我放弃了...
Then I instead cloned a bare repo from my up-to-date working copy. Cloning that one between workstations took ~5 minutes. I then pushed it up to the server as a new repo. Cloning thatrepo took only 7 minutes.
然后我从我的最新工作副本中克隆了一个裸仓库。在工作站之间克隆那个需要大约 5 分钟。然后我将它作为新的存储库推送到服务器。克隆那个repo 只花了 7 分钟。
If I interpret this correctly, a better packed repo performs much better, even without disabling the delta-compression for binary files. I guess this means the steps above are indeed what I want to do in the short term, but in addition I need to find out how to limit the amount of memory git is allowed to use for packing/compression on the server so I can avoid the swapping.
如果我正确解释了这一点,即使没有禁用二进制文件的增量压缩,更好的打包存储库也会表现得更好。我想这意味着上述步骤确实是我在短期内想要做的,但除此之外,我还需要找出如何限制 git 允许用于在服务器上进行打包/压缩的内存量,以便我可以避免交换。
In case it matters: The server runs git 1.7.0.4 and the workstations run 1.7.9.5.
如果重要:服务器运行 git 1.7.0.4,工作站运行 1.7.9.5。
Update 2:
更新 2:
I did the following steps on my testrepo, and think I will chance to do them on the server (after a backup)
我在我的 testrepo 上做了以下步骤,并认为我有机会在服务器上做它们(备份后)
Limit memory usage when packing objects
git config pack.windowMemory 100m
git config pack.packSizeLimit 200mDisable delta compression for some extensions
echo '*.tar.gz -delta' >> info/attributes
echo '*.tar.bz2 -delta' >> info/attributes
echo '*.bin -delta' >> info/attributes
echo '*.png -delta' >> info/attributesRepack repository and collect garbage
git repack -a -d -F --window-memory 100m --max-pack-size 200m
git gc
打包对象时限制内存使用
git config pack.windowMemory 100m
git config pack.packSizeLimit 200m禁用某些扩展的增量压缩
echo '*.tar.gz -delta' >> info/attributes
echo '*.tar.bz2 -delta' >> info/attributes
echo '*.bin -delta' >> info/attributes
echo '*.png -delta ' >> 信息/属性重新打包存储库并收集垃圾
git repack -a -d -F --window-memory 100m --max-pack-size 200m
git gc
Update 3:
更新 3:
Some unexpected side effects after this operation: Issues after trying to repack a git repo for improved performance
此操作后的一些意外副作用:尝试重新打包 git repo 以提高性能后出现问题
回答by Amir Rubin
While your questions asks on how to make your current repo more efficient, I don't think that's feasible.
虽然您的问题是关于如何使您当前的回购更有效率,但我认为这是不可行的。
Follow the advice of the crowd:
听从群众的建议:
- Move your big binaries out of your repo
- Move your dev environment to a virtual machine image: https://www.virtualbox.org/
- Use this Python script to clean your repo of those large binary blobs (I used it to on my repo and it worked great) https://gist.github.com/1433794
- 将你的大二进制文件移出你的仓库
- 将您的开发环境移动到虚拟机映像:https: //www.virtualbox.org/
- 使用此 Python 脚本清除那些大型二进制 blob 的存储库(我在我的存储库中使用它并且效果很好) https://gist.github.com/1433794
回答by xception
You should use a different mechanism for storing the big binaries, if they are generated from something you could just not store them, just the code which generates them, otherwise I suggest moving all of them to a single directory and managing that with rsync or svn depending on your needs.
你应该使用不同的机制来存储大二进制文件,如果它们是从你不能存储它们的东西中生成的,只是生成它们的代码,否则我建议将它们全部移动到一个目录并使用 rsync 或 svn 进行管理取决于您的需要。