使用 Git 管理大型二进制文件
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/540535/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Managing large binary files with Git
提问by pi.
I am looking for opinions of how to handle large binary files on which my source code (web application) is dependent. We are currently discussing several alternatives:
我正在寻找有关如何处理我的源代码(Web 应用程序)所依赖的大型二进制文件的意见。我们目前正在讨论几种替代方案:
- Copy the binary files by hand.
- Pro: Not sure.
- Contra: I am strongly against this, as it increases the likelihood of errors when setting up a new site/migrating the old one. Builds up another hurdle to take.
- Manage them all with Git.
- Pro: Removes the possibility to 'forget' to copy a important file
- Contra: Bloats the repository and decreases flexibility to manage the code-base and checkouts, clones, etc. will take quite a while.
- Separate repositories.
- Pro: Checking out/cloning the source code is fast as ever, and the images are properly archived in their own repository.
- Contra: Removes the simpleness of having the one and onlyGit repository on the project. It surely introduces some other things I haven't thought about.
- 手动复制二进制文件。
- 专家:不确定。
- 反对:我强烈反对这一点,因为它会增加设置新站点/迁移旧站点时出错的可能性。建立另一个障碍。
- 使用Git管理它们。
- 优点:消除了“忘记”复制重要文件的可能性
- 反对:使存储库膨胀并降低管理代码库和检出、克隆等的灵活性将需要相当长的时间。
- 单独的存储库。
- 优点:检出/克隆源代码和以往一样快速,并且图像已正确存档在自己的存储库中。
- 反对:消除了在项目中拥有一个且唯一的Git 存储库的简单性。它肯定会介绍一些我没有考虑过的其他事情。
What are your experiences/thoughts regarding this?
您对此有何经验/想法?
Also: Does anybody have experience with multiple Git repositories and managing them in one project?
另外:有没有人有使用多个 Git 存储库并在一个项目中管理它们的经验?
The files are images for a program which generates PDFs with those files in it. The files will not change very often (as in years), but they are very relevant to a program. The program will not work without the files.
这些文件是用于生成包含这些文件的 PDF 的程序的图像。这些文件不会经常更改(如几年),但它们与程序非常相关。没有这些文件,程序将无法运行。
采纳答案by Pat Notz
If the program won't work without the files it seems like splitting them into a separate repo is a bad idea. We have large test suites that we break into a separate repo but those are truly "auxiliary" files.
如果程序在没有文件的情况下无法运行,那么将它们拆分为单独的存储库似乎是一个坏主意。我们有大量的测试套件,我们将它们分解成一个单独的存储库,但这些都是真正的“辅助”文件。
However, you may be able to manage the files in a separate repo and then use git-submodule
to pull them into your project in a sane way. So, you'd still have the full history of all your source but, as I understand it, you'd only have the one relevant revision of your images submodule. The git-submodule
facility should help you keep the correct version of the code in line with the correct version of the images.
但是,您可以在单独的存储库中管理这些文件,然后git-submodule
以一种理智的方式将它们拉入您的项目中。因此,您仍然拥有所有源的完整历史记录,但据我所知,您只有图像子模块的一个相关修订版。该git-submodule
工具应帮助您使正确版本的代码与正确版本的图像保持一致。
Here's a good introduction to submodulesfrom Git Book.
回答by rafak
I discovered git-annexrecently which I find awesome. It was designed for managing large files efficiently. I use it for my photo/music (etc.) collections. The development of git-annex is very active. The content of the files can be removed from the Git repository, only the tree hierarchy is tracked by Git (through symlinks). However, to get the content of the file, a second step is necessary after pulling/pushing, e.g.:
我最近发现了git-annex,我觉得它很棒。它旨在有效地管理大文件。我将它用于我的照片/音乐(等)收藏。git-annex 的开发非常活跃。文件的内容可以从 Git 存储库中删除,Git 仅跟踪树层次结构(通过符号链接)。但是,要获取文件的内容,需要在拉/推后进行第二步,例如:
$ git annex add mybigfile
$ git commit -m'add mybigfile'
$ git push myremote
$ git annex copy --to myremote mybigfile ## This command copies the actual content to myremote
$ git annex drop mybigfile ## Remove content from local repo
...
$ git annex get mybigfile ## Retrieve the content
## or to specify the remote from which to get:
$ git annex copy --from myremote mybigfile
There are many commands available, and there is a great documentation on the website. A package is available on Debian.
回答by VonC
Another solution, since April 2015 is Git Large File Storage (LFS)(by GitHub).
另一种解决方案是自 2015 年 4 月以来的Git 大文件存储 (LFS)(由 GitHub 提供)。
It uses git-lfs(see git-lfs.github.com) and tested with a server supporting it: lfs-test-server:
You can store metadata only in the git repo, and the large file elsewhere.
它使用git-lfs(请参阅git-lfs.github.com)并使用支持它的服务器进行测试:lfs-test-server:
您只能将元数据存储在 git repo 中,并将大文件存储在其他地方。
回答by sehe
Have a look at git bupwhich is a Git extension to smartly store large binaries in a Git repository.
看看git bup,它是一个 Git 扩展,可以巧妙地将大型二进制文件存储在 Git 存储库中。
You'd want to have it as a submodule, but you won't have to worry about the repository getting hard to handle. One of their sample use cases is storing VM images in Git.
您希望将其作为子模块使用,但您不必担心存储库变得难以处理。他们的示例用例之一是在 Git 中存储 VM 映像。
I haven't actually seen better compression rates, but my repositories don't have really large binaries in them.
我实际上没有看到更好的压缩率,但是我的存储库中没有真正大的二进制文件。
Your mileage may vary.
你的旅费可能会改变。
回答by Carl
You can also use git-fat. I like that it only depends on stock Python and rsync
. It also supports the usual Git workflow, with the following self explanatory commands:
您也可以使用git-fat。我喜欢它只依赖于股票 Python 和rsync
. 它还支持通常的 Git 工作流程,具有以下不言自明的命令:
git fat init
git fat push
git fat pull
In addition, you need to check in a .gitfat file into your repository and modify your .gitattributes to specify the file extensions you want git fat
to manage.
此外,您需要将 .gitfat 文件签入存储库并修改 .gitattributes 以指定要git fat
管理的文件扩展名。
You add a binary using the normal git add
, which in turn invokes git fat
based on your gitattributes rules.
您使用 normal 添加二进制文件git add
,然后git fat
根据您的 gitattributes 规则进行调用。
Finally, it has the advantage that the location where your binaries are actually stored can be shared across repositories and users and supports anything rsync
does.
最后,它的优点是可以跨存储库和用户共享实际存储二进制文件的位置,并支持任何rsync
操作。
UPDATE: Do not use git-fat if you're using a Git-SVN bridge. It will end up removing the binary files from your Subversion repository. However, if you're using a pure Git repository, it works beautifully.
更新:如果您使用的是 Git-SVN 桥接器,请不要使用 git-fat。它将最终从您的 Subversion 存储库中删除二进制文件。但是,如果您使用的是纯 Git 存储库,则它运行良好。
回答by Daniel Fanjul
I would use submodules (as Pat Notz) or two distinct repositories. If you modify your binary files too often, then I would try to minimize the impact of the huge repository cleaning the history:
我会使用子模块(如 Pat Notz)或两个不同的存储库。如果你经常修改你的二进制文件,那么我会尽量减少巨大的存储库清理历史的影响:
I had a very similar problem several months ago: ~21 GB of MP3 files, unclassified (bad names, bad id3's, don't know if I like that MP3 file or not...), and replicated on three computers.
几个月前我遇到了一个非常相似的问题:大约 21 GB 的 MP3 文件,未分类(坏名,坏 id3,不知道我是否喜欢那个 MP3 文件......),并在三台计算机上复制。
I used an external hard disk drive with the main Git repository, and I cloned it into each computer. Then, I started to classify them in the habitual way (pushing, pulling, merging... deleting and renaming many times).
我在主 Git 存储库中使用了外部硬盘驱动器,并将其克隆到每台计算机中。然后,我开始按照习惯的方式对它们进行分类(推、拉、合并……多次删除和重命名)。
At the end, I had only ~6 GB of MP3 files and ~83 GB in the .git directory. I used git-write-tree
and git-commit-tree
to create a new commit, without commit ancestors, and started a new branch pointing to that commit. The "git log" for that branch only showed one commit.
最后,我只有 ~6 GB 的 MP3 文件和 ~83 GB 的 .git 目录。我使用git-write-tree
和git-commit-tree
来创建一个没有提交祖先的新提交,并开始一个指向该提交的新分支。该分支的“git log”仅显示一次提交。
Then, I deleted the old branch, kept only the new branch, deleted the ref-logs, and run "git prune": after that, my .git folders weighted only ~6 GB...
然后,我删除了旧分支,只保留了新分支,删除了引用日志,然后运行“git prune”:之后,我的 .git 文件夹的权重仅为 ~6 GB ...
You could "purge" the huge repository from time to time in the same way: Your "git clone"'s will be faster.
您可以不时以相同的方式“清除”庞大的存储库:您的“git clone”会更快。
回答by claf
In my opinion, if you're likely to often modify those large files, or if you intend to make a lot of git clone
or git checkout
, then you should seriously consider using another Git repository (or maybe another way to access those files).
在我看来,如果您可能经常修改那些大文件,或者如果您打算制作大量git clone
or git checkout
,那么您应该认真考虑使用另一个 Git 存储库(或者可能是另一种访问这些文件的方式)。
But if you work like we do, and if your binary files are not often modified, then the first clone/checkout will be long, but after that it should be as fast as you want (considering your users keep using the first cloned repository they had).
但是如果你像我们一样工作,并且你的二进制文件不经常修改,那么第一次克隆/检出会很长,但之后它应该和你想要的一样快(考虑到你的用户继续使用他们的第一个克隆存储库)有)。
回答by Adam Kurkiewicz
The solution I'd like to propose is based on orphan branches and a slight abuse of the tag mechanism, henceforth referred to as *Orphan Tags Binary Storage (OTABS)
我想提出的解决方案是基于孤立分支和对标签机制的轻微滥用,以下称为*孤立标签二进制存储(OTABS)
TL;DR 12-01-2017If you can use github's LFS or some other 3rd party, by all means you should. If you can't, then read on. Be warned, this solution is a hack and should be treated as such.
TL;DR 12-01-2017如果您可以使用 github 的 LFS 或其他一些第三方,那么您一定应该使用。如果不能,请继续阅读。请注意,此解决方案是一种黑客行为,应如此对待。
Desirable properties of OTABS
OTABS 的理想特性
- it is a pure gitand git onlysolution -- it gets the job done without any 3rd party software (like git-annex) or 3rd party infrastructure (like github's LFS).
- it stores the binary files efficiently, i.e. it doesn't bloat the history of your repository.
git pull
andgit fetch
, includinggit fetch --all
are still bandwidth efficient, i.e. not all large binaries are pulled from the remote by default.- it works on Windows.
- it stores everything in a single git repository.
- it allows for deletionof outdated binaries (unlike bup).
- 它是一个纯 git和git only解决方案——它在没有任何 3rd 方软件(如 git-annex)或 3rd 方基础设施(如 github 的 LFS)的情况下完成工作。
- 它有效地存储二进制文件,即它不会膨胀存储库的历史记录。
git pull
和git fetch
,包括git fetch --all
仍然有效带宽,即默认情况下并非所有大型二进制文件都从远程拉出。- 它适用于Windows。
- 它将所有内容存储在单个 git 存储库中。
- 它允许删除过时的二进制文件(与 bup 不同)。
Undesirable properties of OTABS
OTABS 的不良特性
- it makes
git clone
potentially inefficient (but not necessarily, depending on your usage). If you deploy this solution you might have to advice your colleagues to usegit clone -b master --single-branch <url>
instead ofgit clone
. This is because git clone by default literally clones entirerepository, including things you wouldn't normally want to waste your bandwidth on, like unreferenced commits. Taken from SO 4811434. - it makes
git fetch <remote> --tags
bandwidth inefficient, but not necessarily storage inefficient. You can can always advise your colleagues not to use it. - you'll have to periodically use a
git gc
trick to clean your repository from any files you don't want any more. - it is not as efficient as bupor git-bigfiles. But it's respectively more suitable for what you're trying to do and more off-the-shelf. You are likely to run into trouble with hundreds of thousands of small files or with files in range of gigabytes, but read on for workarounds.
- 它
git clone
可能会降低效率(但不一定,取决于您的使用情况)。如果您部署该解决方案,您可能需要你的同事的意见,使用git clone -b master --single-branch <url>
代替git clone
。这是因为 git clone 默认情况下从字面上克隆整个存储库,包括您通常不想浪费带宽的内容,例如未引用的提交。取自SO 4811434。 - 它使
git fetch <remote> --tags
带宽效率低下,但不一定存储效率低下。您可以随时建议您的同事不要使用它。 - 您必须定期使用
git gc
技巧从您不再需要的任何文件中清除您的存储库。 - 它不如bup或git-bigfiles 有效。但它分别更适合您尝试做的事情和现成的。您可能会遇到成百上千的小文件或千兆字节范围内的文件的麻烦,但请继续阅读以了解解决方法。
Adding the Binary Files
添加二进制文件
Before you start make sure that you've committed all your changes, your working tree is up to date and your index doesn't contain any uncommitted changes. It might be a good idea to push all your local branches to your remote (github etc.) in case any disaster should happen.
在开始之前,请确保您已提交所有更改,您的工作树是最新的,并且您的索引不包含任何未提交的更改。将所有本地分支推送到远程(github 等)可能是一个好主意,以防发生任何灾难。
- Create a new orphan branch.
git checkout --orphan binaryStuff
will do the trick. This produces a branch that is entirely disconnected from any other branch, and the first commit you'll make in this branch will have no parent, which will make it a root commit. - Clean your index using
git rm --cached * .gitignore
. - Take a deep breath and delete entire working tree using
rm -fr * .gitignore
. Internal.git
directory will stay untouched, because the*
wildcard doesn't match it. - Copy in your VeryBigBinary.exe, or your VeryHeavyDirectory/.
- Add it && commit it.
- Now it becomes tricky -- if you push it into the remote as a branch all your developers will download it the next time they invoke
git fetch
clogging their connection. You can avoid this by pushing a tag instead of a branch. This can still impact your colleague's bandwidth and filesystem storage if they have a habit of typinggit fetch <remote> --tags
, but read on for a workaround. Go ahead andgit tag 1.0.0bin
- Push your orphan tag
git push <remote> 1.0.0bin
. - Just so you never push your binary branch by accident, you can delete it
git branch -D binaryStuff
. Your commit will not be marked for garbage collection, because an orphan tag pointing on it1.0.0bin
is enough to keep it alive.
- 创建一个新的孤立分支。
git checkout --orphan binaryStuff
会做的伎俩。这会生成一个与任何其他分支完全断开连接的分支,并且您将在此分支中进行的第一次提交将没有父级,这将使其成为根提交。 - 使用
git rm --cached * .gitignore
. - 深呼吸并使用
rm -fr * .gitignore
. 内部.git
目录将保持不变,因为*
通配符不匹配。 - 复制到您的 VeryBigBinary.exe 或您的 VeryHeavyDirectory/ 中。
- 添加它并提交它。
- 现在它变得棘手——如果你将它作为一个分支推送到远程,你的所有开发人员将在他们下次调用
git fetch
阻塞连接时下载它。您可以通过推送标签而不是分支来避免这种情况。如果您的同事有键入 的习惯,这仍然会影响他们的带宽和文件系统存储git fetch <remote> --tags
,但请继续阅读以寻求解决方法。继续和git tag 1.0.0bin
- 推送您的孤儿标签
git push <remote> 1.0.0bin
。 - 为了让您永远不会意外推送您的二进制分支,您可以将其删除
git branch -D binaryStuff
。您的提交不会被标记为垃圾收集,因为指向它的孤立标记1.0.0bin
足以使其保持活动状态。
Checking out the Binary File
检出二进制文件
- How do I (or my colleagues) get the VeryBigBinary.exe checked out into the current working tree? If your current working branch is for example master you can simply
git checkout 1.0.0bin -- VeryBigBinary.exe
. - This will fail if you don't have the orphan tag
1.0.0bin
downloaded, in which case you'll have togit fetch <remote> 1.0.0bin
beforehand. - You can add the
VeryBigBinary.exe
into your master's.gitignore
, so that no-one on your team will pollute the main history of the project with the binary by accident.
- 我(或我的同事)如何将 VeryBigBinary.exe 签出到当前工作树中?例如,如果您当前的工作分支是 master,您可以简单地
git checkout 1.0.0bin -- VeryBigBinary.exe
. - 如果您没有
1.0.0bin
下载孤儿标签,这将失败,在这种情况下,您必须git fetch <remote> 1.0.0bin
事先下载。 - 您可以将 加入
VeryBigBinary.exe
到您的 master 中.gitignore
,这样您团队中的任何人都不会意外地使用二进制文件污染项目的主要历史记录。
Completely Deleting the Binary File
彻底删除二进制文件
If you decide to completely purge VeryBigBinary.exe from your local repository, your remote repository and your colleague's repositories you can just:
如果您决定从本地存储库、远程存储库和同事的存储库中完全清除 VeryBigBinary.exe,您只需:
- Delete the orphan tag on the remote
git push <remote> :refs/tags/1.0.0bin
- Delete the orphan tag locally (deletes all other unreferenced tags)
git tag -l | xargs git tag -d && git fetch --tags
. Taken from SO 1841341with slight modification. - Use a git gc trick to delete your now unreferenced commit locally.
git -c gc.reflogExpire=0 -c gc.reflogExpireUnreachable=0 -c gc.rerereresolved=0 -c gc.rerereunresolved=0 -c gc.pruneExpire=now gc "$@"
. It will also delete all other unreferenced commits. Taken from SO 1904860 - If possible, repeat the git gc trick on the remote. It is possible if you're self-hosting your repository and might not be possible with some git providers, like github or in some corporate environments. If you're hosting with a provider that doesn't give you ssh access to the remote just let it be. It is possible that your provider's infrastructure will clean your unreferenced commit in their own sweet time. If you're in a corporate environment you can advice your IT to run a cron job garbage collecting your remote once per week or so. Whether they do or don't will not have any impact on your team in terms of bandwidth and storage, as long as you advise your colleagues to always
git clone -b master --single-branch <url>
instead ofgit clone
. - All your colleagues who want to get rid of outdated orphan tags need only to apply steps 2-3.
- You can then repeat the steps 1-8 of Adding the Binary Filesto create a new orphan tag
2.0.0bin
. If you're worried about your colleagues typinggit fetch <remote> --tags
you can actually name it again1.0.0bin
. This will make sure that the next time they fetch all the tags the old1.0.0bin
will be unreferenced and marked for subsequent garbage collection (using step 3). When you try to overwrite a tag on the remote you have to use-f
like this:git push -f <remote> <tagname>
- 删除遥控器上的孤儿标签
git push <remote> :refs/tags/1.0.0bin
- 在本地删除孤立标记(删除所有其他未引用的标记)
git tag -l | xargs git tag -d && git fetch --tags
。取自SO 1841341,稍加修改。 - 使用 git gc 技巧在本地删除您现在未引用的提交。
git -c gc.reflogExpire=0 -c gc.reflogExpireUnreachable=0 -c gc.rerereresolved=0 -c gc.rerereunresolved=0 -c gc.pruneExpire=now gc "$@"
. 它还将删除所有其他未引用的提交。取自SO 1904860 - 如果可能,请在遥控器上重复 git gc 技巧。如果您是自托管存储库,则可能无法使用某些 git 提供程序,例如 github 或某些公司环境。如果您托管的提供商不授予您对远程的 ssh 访问权限,那就顺其自然。您的提供者的基础架构可能会在他们自己的甜蜜时光中清理您未引用的提交。如果您在公司环境中,您可以建议您的 IT 运行一个 cron 作业,每周一次左右收集您的遥控器。无论他们是否这样做,都不会在带宽和存储方面对您的团队产生任何影响,只要您建议您的同事始终
git clone -b master --single-branch <url>
使用git clone
. - 所有想要摆脱过时孤儿标签的同事只需要应用步骤 2-3。
- 然后,您可以重复添加二进制文件的步骤 1-8以创建新的孤立标记
2.0.0bin
。如果您担心同事打字,git fetch <remote> --tags
您实际上可以重新命名1.0.0bin
。这将确保下次他们获取所有标签时,旧标签1.0.0bin
将被取消引用并标记为后续垃圾收集(使用第 3 步)。当您尝试覆盖遥控器上的标签时,您必须-f
像这样使用:git push -f <remote> <tagname>
Afterword
后记
OTABS doesn't touch your master or any other source code/development branches. The commit hashes, all of the history, and small size of these branches is unaffected. If you've already bloated your source code history with binary files you'll have to clean it up as a separate piece of work. This scriptmight be useful.
Confirmed to work on Windows with git-bash.
It is a good idea to apply a set of standard tricsto make storage of binary files more efficient. Frequent running of
git gc
(without any additional arguments) makes git optimise underlying storage of your files by using binary deltas. However, if your files are unlikely to stay similar from commit to commit you can switch off binary deltas altogether. Additionally, because it makes no sense to compress already compressed or encrypted files, like .zip, .jpg or .crypt, git allows you to switch off compression of the underlying storage. Unfortunately it's an all-or-nothing setting affecting your source code as well.You might want to script up parts of OTABS to allow for quicker usage. In particular, scripting steps 2-3 from Completely Deleting Binary Filesinto an
update
git hook could give a compelling but perhaps dangerous semantics to git fetch ("fetch and delete everything that is out of date").You might want to skip the step 4 of Completely Deleting Binary Filesto keep a full history of all binary changes on the remote at the cost of the central repository bloat. Local repositories will stay lean over time.
In Java world it is possible to combine this solution with
maven --offline
to create a reproducible offline build stored entirely in your version control (it's easier with maven than with gradle). In Golang world it is feasible to build on this solution to manage your GOPATH instead ofgo get
. In python world it is possible to combine this with virtualenv to produce a self-contained development environment without relying on PyPi servers for every build from scratch.If your binary files change very often, like build artifacts, it might be a good idea to script a solution which stores 5 most recent versions of the artifacts in the orphan tags
monday_bin
,tuesday_bin
, ...,friday_bin
, and also an orphan tag for each release1.7.8bin
2.0.0bin
, etc. You can rotate theweekday_bin
and delete old binaries daily. This way you get the best of two worlds: you keep the entirehistory of your source code but only the relevanthistory of your binary dependencies. It is also very easy to get the binary files for a given tag withoutgetting entire source code with all its history:git init && git remote add <name> <url> && git fetch <name> <tag>
should do it for you.
OTABS 不会触及您的主人或任何其他源代码/开发分支。这些分支的提交哈希、所有历史记录和小尺寸不受影响。如果您已经用二进制文件膨胀了源代码历史记录,则必须将其作为单独的工作进行清理。这个脚本可能有用。
确认使用 git-bash 在 Windows 上工作。
应用一组标准技巧来提高二进制文件的存储效率是一个好主意。频繁运行
git gc
(没有任何附加参数)使 git 通过使用二进制增量来优化文件的底层存储。但是,如果您的文件在每次提交之间不太可能保持相似,您可以完全关闭二进制增量。此外,因为压缩已经压缩或加密的文件(如 .zip、.jpg 或 .crypt)没有意义,所以 git 允许您关闭底层存储的压缩。不幸的是,它也是一个影响源代码的全有或全无设置。您可能希望编写部分 OTABS 的脚本,以便更快地使用。特别是,从完全删除二进制文件到
update
git 钩子的脚本步骤 2-3可以为 git fetch (“获取并删除所有过时的东西”)提供引人注目但可能危险的语义。您可能希望跳过完全删除二进制文件的第 4 步,以保留远程所有二进制更改的完整历史记录,但代价是中央存储库膨胀。随着时间的推移,本地存储库将保持精简。
在 Java 世界中,可以将此解决方案与
maven --offline
创建完全存储在版本控制中的可重现离线构建相结合(使用 maven 比使用 gradle 更容易)。在 Golang 世界中,构建此解决方案来管理您的 GOPATH 而不是go get
. 在 python 世界中,可以将它与 virtualenv 结合起来,以生成一个独立的开发环境,而无需依赖 PyPi 服务器从头开始构建。如果您的二进制文件经常更改,例如构建工件,那么编写一个解决方案可能是一个好主意,该解决方案将工件的 5 个最新版本存储在孤立标签
monday_bin
,tuesday_bin
, ...friday_bin
, 以及每个版本的孤立标签中1.7.8bin
2.0.0bin
等。您可以weekday_bin
每天轮换和删除旧的二进制文件。通过这种方式,您可以两全其美:您保留源代码的整个历史记录,但只保留二进制依赖项的相关历史记录。获取给定标签的二进制文件也很容易,而无需获取包含所有历史记录的完整源代码:git init && git remote add <name> <url> && git fetch <name> <tag>
应该为您做。
回答by Tony Diep
SVN seems to handle binary deltas more efficiently than Git.
SVN 似乎比 Git 更有效地处理二进制增量。
I had to decide on a versioning system for documentation (JPEG files, PDF files, and .odt files). I just tested adding a JPEG file and rotating it 90 degrees four times (to check effectiveness of binary deltas). Git's repository grew 400%. SVN's repository grew by only 11%.
我必须决定文档(JPEG 文件、PDF 文件和 .odt 文件)的版本控制系统。我刚刚测试了添加一个 JPEG 文件并将其旋转 90 度四次(以检查二进制增量的有效性)。Git 的存储库增长了 400%。SVN 的存储库仅增长了 11%。
So it looks like SVN is much more efficient with binary files.
所以看起来 SVN 对二进制文件的效率要高得多。
So my choice is Git for source code and SVN for binary files like documentation.
所以我的选择是 Git 源代码和 SVN 二进制文件,如文档。
回答by Josh Habdas
I am looking for opinions of how to handle large binary files on which my source code (web application) is dependent. What are your experiences/thoughts regarding this?
我正在寻找有关如何处理我的源代码(Web 应用程序)所依赖的大型二进制文件的意见。您对此有何经验/想法?
I personally have run into synchronisation failures with Gitwith some of my cloud hosts once my web applications binary data notched above the 3 GB mark. I considered BFT Repo Cleanerat the time, but it felt like a hack. Since then I've begun to just keep files outside of Git purview, instead leveraging purpose-built toolssuch as Amazon S3 for managing files, versioning and back-up.
一旦我的 Web 应用程序二进制数据超过 3 GB 标记,我个人就遇到了使用 Git与我的一些云主机同步失败的情况。我当时考虑过BFT Repo Cleaner,但感觉就像一个黑客。从那时起,我开始将文件保留在 Git 权限之外,而是利用Amazon S3 等专用工具来管理文件、版本控制和备份。
Does anybody have experience with multiple Git repositories and managing them in one project?
有没有人有使用多个 Git 存储库并在一个项目中管理它们的经验?
Yes. Hugo themesare primarily managed this way. It's a little kudgy, but it gets the job done.
是的。Hugo 主题主要以这种方式管理。这有点笨拙,但它完成了工作。
My suggestion is to choose the right tool for the job. If it's for a company and you're managing your codeline on GitHub pay the money and use Git-LFS. Otherwise you could explore more creative options such as decentralized, encrypted file storage using blockchain.
我的建议是为工作选择合适的工具。如果它是为一家公司准备的,并且您在 GitHub 上管理您的代码行,请支付费用并使用 Git-LFS。否则,您可以探索更多创意选项,例如使用区块链的分散式加密文件存储。