使用 Git 管理大型二进制文件

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/540535/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-10 06:09:39  来源:igfitidea点击:

Managing large binary files with Git

gitversion-controllarge-filesbinaryfiles

提问by pi.

I am looking for opinions of how to handle large binary files on which my source code (web application) is dependent. We are currently discussing several alternatives:

我正在寻找有关如何处理我的源代码(Web 应用程序)所依赖的大型二进制文件的意见。我们目前正在讨论几种替代方案:

  1. Copy the binary files by hand.
    • Pro: Not sure.
    • Contra: I am strongly against this, as it increases the likelihood of errors when setting up a new site/migrating the old one. Builds up another hurdle to take.
  2. Manage them all with Git.
    • Pro: Removes the possibility to 'forget' to copy a important file
    • Contra: Bloats the repository and decreases flexibility to manage the code-base and checkouts, clones, etc. will take quite a while.
  3. Separate repositories.
    • Pro: Checking out/cloning the source code is fast as ever, and the images are properly archived in their own repository.
    • Contra: Removes the simpleness of having the one and onlyGit repository on the project. It surely introduces some other things I haven't thought about.
  1. 手动复制二进制文件。
    • 专家:不确定。
    • 反对:我强烈反对这一点,因为它会增加设置新站点/迁移旧站点时出错的可能性。建立另一个障碍。
  2. 使用Git管理它们。
    • 优点:消除了“忘记”复制重要文件的可能性
    • 反对:使存储库膨胀并降低管理代码库和检出、克隆等的灵活性将需要相当长的时间。
  3. 单独的存储库。
    • 优点:检出/克隆源代码和以往一样快速,并且图像已正确存档在自己的存储库中。
    • 反对:消除了在项目中拥有一个且唯一的Git 存储库的简单性。它肯定会介绍一些我没有考虑过的其他事情。

What are your experiences/thoughts regarding this?

您对此有何经验/想法?

Also: Does anybody have experience with multiple Git repositories and managing them in one project?

另外:有没有人有使用多个 Git 存储库并在一个项目中管理它们的经验?

The files are images for a program which generates PDFs with those files in it. The files will not change very often (as in years), but they are very relevant to a program. The program will not work without the files.

这些文件是用于生成包含这些文件的 PDF 的程序的图像。这些文件不会经常更改(如几年),但它们与程序非常相关。没有这些文件,程序将无法运行。

采纳答案by Pat Notz

If the program won't work without the files it seems like splitting them into a separate repo is a bad idea. We have large test suites that we break into a separate repo but those are truly "auxiliary" files.

如果程序在没有文件的情况下无法运行,那么将它们拆分为单独的存储库似乎是一个坏主意。我们有大量的测试套件,我们将它们分解成一个单独的存储库,但这些都是真正的“辅助”文件。

However, you may be able to manage the files in a separate repo and then use git-submoduleto pull them into your project in a sane way. So, you'd still have the full history of all your source but, as I understand it, you'd only have the one relevant revision of your images submodule. The git-submodulefacility should help you keep the correct version of the code in line with the correct version of the images.

但是,您可以在单独的存储库中管理这些文件,然后git-submodule以一种理智的方式将它们拉入您的项目中。因此,您仍然拥有所有源的完整历史记录,但据我所知,您只有图像子模块的一个相关修订版。该git-submodule工具应帮助您使正确版本的代码与正确版本的图像保持一致。

Here's a good introduction to submodulesfrom Git Book.

这是Git Book中子模块的一个很好的介绍

回答by rafak

I discovered git-annexrecently which I find awesome. It was designed for managing large files efficiently. I use it for my photo/music (etc.) collections. The development of git-annex is very active. The content of the files can be removed from the Git repository, only the tree hierarchy is tracked by Git (through symlinks). However, to get the content of the file, a second step is necessary after pulling/pushing, e.g.:

我最近发现了git-annex,我觉得它很棒。它旨在有效地管理大文件。我将它用于我的照片/音乐(等)收藏。git-annex 的开发非常活跃。文件的内容可以从 Git 存储库中删除,Git 仅跟踪树层次结构(通过符号链接)。但是,要获取文件的内容,需要在拉/推后进行第二步,例如:

$ git annex add mybigfile
$ git commit -m'add mybigfile'
$ git push myremote
$ git annex copy --to myremote mybigfile ## This command copies the actual content to myremote
$ git annex drop mybigfile ## Remove content from local repo
...
$ git annex get mybigfile ## Retrieve the content
## or to specify the remote from which to get:
$ git annex copy --from myremote mybigfile

There are many commands available, and there is a great documentation on the website. A package is available on Debian.

有很多可用的命令,网站上有一个很好的文档。Debian上有一个软件包

回答by VonC

Another solution, since April 2015 is Git Large File Storage (LFS)(by GitHub).

另一种解决方案是自 2015 年 4 月以来的Git 大文件存储 (LFS)(由 GitHub 提供)。

It uses git-lfs(see git-lfs.github.com) and tested with a server supporting it: lfs-test-server:
You can store metadata only in the git repo, and the large file elsewhere.

它使用git-lfs(请参阅git-lfs.github.com)并使用支持它的服务器进行测试:lfs-test-server
您只能将元数据存储在 git repo 中,并将大文件存储在其他地方。

https://cloud.githubusercontent.com/assets/1319791/7051226/c4570828-ddf4-11e4-87eb-8fc165e5ece4.gif

https://cloud.githubusercontent.com/assets/1319791/7051226/c4570828-ddf4-11e4-87eb-8fc165e5ece4.gif

回答by sehe

Have a look at git bupwhich is a Git extension to smartly store large binaries in a Git repository.

看看git bup,它是一个 Git 扩展,可以巧妙地将大型二进制文件存储在 Git 存储库中。

You'd want to have it as a submodule, but you won't have to worry about the repository getting hard to handle. One of their sample use cases is storing VM images in Git.

您希望将其作为子模块使用,但您不必担心存储库变得难以处理。他们的示例用例之一是在 Git 中存储 VM 映像。

I haven't actually seen better compression rates, but my repositories don't have really large binaries in them.

我实际上没有看到更好的压缩率,但是我的存储库中没有真正大的二进制文件。

Your mileage may vary.

你的旅费可能会改变。

回答by Carl

You can also use git-fat. I like that it only depends on stock Python and rsync. It also supports the usual Git workflow, with the following self explanatory commands:

您也可以使用git-fat。我喜欢它只依赖于股票 Python 和rsync. 它还支持通常的 Git 工作流程,具有以下不言自明的命令:

git fat init
git fat push
git fat pull

In addition, you need to check in a .gitfat file into your repository and modify your .gitattributes to specify the file extensions you want git fatto manage.

此外,您需要将 .gitfat 文件签入存储库并修改 .gitattributes 以指定要git fat管理的文件扩展名。

You add a binary using the normal git add, which in turn invokes git fatbased on your gitattributes rules.

您使用 normal 添加二进制文件git add,然后git fat根据您的 gitattributes 规则进行调用。

Finally, it has the advantage that the location where your binaries are actually stored can be shared across repositories and users and supports anything rsyncdoes.

最后,它的优点是可以跨存储库和用户共享实际存储二进制文件的位置,并支持任何rsync操作。

UPDATE: Do not use git-fat if you're using a Git-SVN bridge. It will end up removing the binary files from your Subversion repository. However, if you're using a pure Git repository, it works beautifully.

更新:如果您使用的是 Git-SVN 桥接器,请不要使用 git-fat。它将最终从您的 Subversion 存储库中删除二进制文件。但是,如果您使用的是纯 Git 存储库,则它运行良好。

回答by Daniel Fanjul

I would use submodules (as Pat Notz) or two distinct repositories. If you modify your binary files too often, then I would try to minimize the impact of the huge repository cleaning the history:

我会使用子模块(如 Pat Notz)或两个不同的存储库。如果你经常修改你的二进制文件,那么我会尽量减少巨大的存储库清理历史的影响:

I had a very similar problem several months ago: ~21 GB of MP3 files, unclassified (bad names, bad id3's, don't know if I like that MP3 file or not...), and replicated on three computers.

几个月前我遇到了一个非常相似的问题:大约 21 GB 的 MP3 文件,未分类(坏名,坏 id3,不知道我是否喜欢那个 MP3 文件......),并在三台计算机上复制。

I used an external hard disk drive with the main Git repository, and I cloned it into each computer. Then, I started to classify them in the habitual way (pushing, pulling, merging... deleting and renaming many times).

我在主 Git 存储库中使用了外部硬盘驱动器,并将其克隆到每台计算机中。然后,我开始按照习惯的方式对它们进行分类(推、拉、合并……多次删除和重命名)。

At the end, I had only ~6 GB of MP3 files and ~83 GB in the .git directory. I used git-write-treeand git-commit-treeto create a new commit, without commit ancestors, and started a new branch pointing to that commit. The "git log" for that branch only showed one commit.

最后,我只有 ~6 GB 的 MP3 文件和 ~83 GB 的 .git 目录。我使用git-write-treegit-commit-tree来创建一个没有提交祖先的新提交,并开始一个指向该提交的新分支。该分支的“git log”仅显示一次提交。

Then, I deleted the old branch, kept only the new branch, deleted the ref-logs, and run "git prune": after that, my .git folders weighted only ~6 GB...

然后,我删除了旧分支,只保留了新分支,删除了引用日志,然后运行“git prune”:之后,我的 .git 文件夹的权重仅为 ~6 GB ...

You could "purge" the huge repository from time to time in the same way: Your "git clone"'s will be faster.

您可以不时以相同的方式“清除”庞大的存储库:您的“git clone”会更快。

回答by claf

In my opinion, if you're likely to often modify those large files, or if you intend to make a lot of git cloneor git checkout, then you should seriously consider using another Git repository (or maybe another way to access those files).

在我看来,如果您可能经常修改那些大文件,或者如果您打算制作大量git cloneor git checkout,那么您应该认真考虑使用另一个 Git 存储库(或者可能是另一种访问这些文件的方式)。

But if you work like we do, and if your binary files are not often modified, then the first clone/checkout will be long, but after that it should be as fast as you want (considering your users keep using the first cloned repository they had).

但是如果你像我们一样工作,并且你的二进制文件不经常修改,那么第一次克隆/检出会很长,但之后它应该和你想要的一样快(考虑到你的用户继续使用他们的第一个克隆存储库)有)。

回答by Adam Kurkiewicz

The solution I'd like to propose is based on orphan branches and a slight abuse of the tag mechanism, henceforth referred to as *Orphan Tags Binary Storage (OTABS)

我想提出的解决方案是基于孤立分支和对标签机制的轻微滥用,以下称为*孤立标签二进制存储(OTABS)

TL;DR 12-01-2017If you can use github's LFS or some other 3rd party, by all means you should. If you can't, then read on. Be warned, this solution is a hack and should be treated as such.

TL;DR 12-01-2017如果您可以使用 github 的 LFS 或其他一些第三方,那么您一定应该使用。如果不能,请继续阅读。请注意,此解决方案是一种黑客行为,应如此对待。

Desirable properties of OTABS

OTABS 的理想特性

  • it is a pure gitand git onlysolution -- it gets the job done without any 3rd party software (like git-annex) or 3rd party infrastructure (like github's LFS).
  • it stores the binary files efficiently, i.e. it doesn't bloat the history of your repository.
  • git pulland git fetch, including git fetch --allare still bandwidth efficient, i.e. not all large binaries are pulled from the remote by default.
  • it works on Windows.
  • it stores everything in a single git repository.
  • it allows for deletionof outdated binaries (unlike bup).
  • 它是一个纯 gitgit only解决方案——它在没有任何 3rd 方软件(如 git-annex)或 3rd 方基础设施(如 github 的 LFS)的情况下完成工作。
  • 有效地存储二进制文件,即它不会膨胀存储库的历史记录。
  • git pullgit fetch,包括git fetch --all仍然有效带宽,即默认情况下并非所有大型二进制文件都从远程拉出。
  • 它适用于Windows
  • 它将所有内容存储在单个 git 存储库中
  • 它允许删除过时的二进制文件(与 bup 不同)。

Undesirable properties of OTABS

OTABS 的不良特性

  • it makes git clonepotentially inefficient (but not necessarily, depending on your usage). If you deploy this solution you might have to advice your colleagues to use git clone -b master --single-branch <url>instead of git clone. This is because git clone by default literally clones entirerepository, including things you wouldn't normally want to waste your bandwidth on, like unreferenced commits. Taken from SO 4811434.
  • it makes git fetch <remote> --tagsbandwidth inefficient, but not necessarily storage inefficient. You can can always advise your colleagues not to use it.
  • you'll have to periodically use a git gctrick to clean your repository from any files you don't want any more.
  • it is not as efficient as bupor git-bigfiles. But it's respectively more suitable for what you're trying to do and more off-the-shelf. You are likely to run into trouble with hundreds of thousands of small files or with files in range of gigabytes, but read on for workarounds.
  • git clone可能会降低效率(但不一定,取决于您的使用情况)。如果您部署该解决方案,您可能需要你的同事的意见,使用git clone -b master --single-branch <url>代替git clone。这是因为 git clone 默认情况下从字面上克隆整个存储库,包括您通常不想浪费带宽的内容,例如未引用的提交。取自SO 4811434
  • 它使git fetch <remote> --tags带宽效率低下,但不一定存储效率低下。您可以随时建议您的同事不要使用它。
  • 您必须定期使用git gc技巧从您不再需要的任何文件中清除您的存储库。
  • 它不如bupgit-bigfiles 有效。但它分别更适合您尝试做的事情和现成的。您可能会遇到成百上千的小文件或千兆字节范围内的文件的麻烦,但请继续阅读以了解解决方法。

Adding the Binary Files

添加二进制文件

Before you start make sure that you've committed all your changes, your working tree is up to date and your index doesn't contain any uncommitted changes. It might be a good idea to push all your local branches to your remote (github etc.) in case any disaster should happen.

在开始之前,请确保您已提交所有更改,您的工作树是最新的,并且您的索引不包含任何未提交的更改。将所有本地分支推送到远程(github 等)可能是一个好主意,以防发生任何灾难。

  1. Create a new orphan branch. git checkout --orphan binaryStuffwill do the trick. This produces a branch that is entirely disconnected from any other branch, and the first commit you'll make in this branch will have no parent, which will make it a root commit.
  2. Clean your index using git rm --cached * .gitignore.
  3. Take a deep breath and delete entire working tree using rm -fr * .gitignore. Internal .gitdirectory will stay untouched, because the *wildcard doesn't match it.
  4. Copy in your VeryBigBinary.exe, or your VeryHeavyDirectory/.
  5. Add it && commit it.
  6. Now it becomes tricky -- if you push it into the remote as a branch all your developers will download it the next time they invoke git fetchclogging their connection. You can avoid this by pushing a tag instead of a branch. This can still impact your colleague's bandwidth and filesystem storage if they have a habit of typing git fetch <remote> --tags, but read on for a workaround. Go ahead and git tag 1.0.0bin
  7. Push your orphan tag git push <remote> 1.0.0bin.
  8. Just so you never push your binary branch by accident, you can delete it git branch -D binaryStuff. Your commit will not be marked for garbage collection, because an orphan tag pointing on it 1.0.0binis enough to keep it alive.
  1. 创建一个新的孤立分支。git checkout --orphan binaryStuff会做的伎俩。这会生成一个与任何其他分支完全断开连接的分支,并且您将在此分支中进行的第一次提交将没有父级,这将使其成为根提交。
  2. 使用git rm --cached * .gitignore.
  3. 深呼吸并使用rm -fr * .gitignore. 内部.git目录将保持不变,因为*通配符不匹配。
  4. 复制到您的 VeryBigBinary.exe 或您的 VeryHeavyDirectory/ 中。
  5. 添加它并提交它。
  6. 现在它变得棘手——如果你将它作为一个分支推送到远程,你的所有开发人员将在他们下次调用git fetch阻塞连接时下载它。您可以通过推送标签而不是分支来避免这种情况。如果您的同事有键入 的习惯,这仍然会影响他们的带宽和文件系统存储git fetch <remote> --tags,但请继续阅读以寻求解决方法。继续和git tag 1.0.0bin
  7. 推送您的孤儿标签git push <remote> 1.0.0bin
  8. 为了让您永远不会意外推送您的二进制分支,您可以将其删除git branch -D binaryStuff。您的提交不会被标记为垃圾收集,因为指向它的孤立标记1.0.0bin足以使其保持活动状态。

Checking out the Binary File

检出二进制文件

  1. How do I (or my colleagues) get the VeryBigBinary.exe checked out into the current working tree? If your current working branch is for example master you can simply git checkout 1.0.0bin -- VeryBigBinary.exe.
  2. This will fail if you don't have the orphan tag 1.0.0bindownloaded, in which case you'll have to git fetch <remote> 1.0.0binbeforehand.
  3. You can add the VeryBigBinary.exeinto your master's .gitignore, so that no-one on your team will pollute the main history of the project with the binary by accident.
  1. 我(或我的同事)如何将 VeryBigBinary.exe 签出到当前工作树中?例如,如果您当前的工作分支是 master,您可以简单地git checkout 1.0.0bin -- VeryBigBinary.exe.
  2. 如果您没有1.0.0bin下载孤儿标签,这将失败,在这种情况下,您必须git fetch <remote> 1.0.0bin事先下载。
  3. 您可以将 加入VeryBigBinary.exe到您的 master 中.gitignore,这样您团队中的任何人都不会意外地使用二进制文件污染项目的主要历史记录。

Completely Deleting the Binary File

彻底删除二进制文件

If you decide to completely purge VeryBigBinary.exe from your local repository, your remote repository and your colleague's repositories you can just:

如果您决定从本地存储库、远程存储库和同事的存储库中完全清除 VeryBigBinary.exe,您只需:

  1. Delete the orphan tag on the remote git push <remote> :refs/tags/1.0.0bin
  2. Delete the orphan tag locally (deletes all other unreferenced tags) git tag -l | xargs git tag -d && git fetch --tags. Taken from SO 1841341with slight modification.
  3. Use a git gc trick to delete your now unreferenced commit locally. git -c gc.reflogExpire=0 -c gc.reflogExpireUnreachable=0 -c gc.rerereresolved=0 -c gc.rerereunresolved=0 -c gc.pruneExpire=now gc "$@". It will also delete all other unreferenced commits. Taken from SO 1904860
  4. If possible, repeat the git gc trick on the remote. It is possible if you're self-hosting your repository and might not be possible with some git providers, like github or in some corporate environments. If you're hosting with a provider that doesn't give you ssh access to the remote just let it be. It is possible that your provider's infrastructure will clean your unreferenced commit in their own sweet time. If you're in a corporate environment you can advice your IT to run a cron job garbage collecting your remote once per week or so. Whether they do or don't will not have any impact on your team in terms of bandwidth and storage, as long as you advise your colleagues to always git clone -b master --single-branch <url>instead of git clone.
  5. All your colleagues who want to get rid of outdated orphan tags need only to apply steps 2-3.
  6. You can then repeat the steps 1-8 of Adding the Binary Filesto create a new orphan tag 2.0.0bin. If you're worried about your colleagues typing git fetch <remote> --tagsyou can actually name it again 1.0.0bin. This will make sure that the next time they fetch all the tags the old 1.0.0binwill be unreferenced and marked for subsequent garbage collection (using step 3). When you try to overwrite a tag on the remote you have to use -flike this: git push -f <remote> <tagname>
  1. 删除遥控器上的孤儿标签 git push <remote> :refs/tags/1.0.0bin
  2. 在本地删除孤立标记(删除所有其他未引用的标记)git tag -l | xargs git tag -d && git fetch --tags。取自SO 1841341,稍加修改。
  3. 使用 git gc 技巧在本地删除您现在未引用的提交。git -c gc.reflogExpire=0 -c gc.reflogExpireUnreachable=0 -c gc.rerereresolved=0 -c gc.rerereunresolved=0 -c gc.pruneExpire=now gc "$@". 它还将删除所有其他未引用的提交。取自SO 1904860
  4. 如果可能,请在遥控器上重复 git gc 技巧。如果您是自托管存储库,则可能无法使用某些 git 提供程序,例如 github 或某些公司环境。如果您托管的提供商不授予您对远程的 ssh 访问权限,那就顺其自然。您的提供者的基础架构可能会在他们自己的甜蜜时光中清理您未引用的提交。如果您在公司环境中,您可以建议您的 IT 运行一个 cron 作业,每周一次左右收集您的遥控器。无论他们是否这样做,都不会在带宽和存储方面对您的团队产生任何影响,只要您建议您的同事始终git clone -b master --single-branch <url>使用git clone.
  5. 所有想要摆脱过时孤儿标签的同事只需要应用步骤 2-3。
  6. 然后,您可以重复添加二进制文件的步骤 1-8以创建新的孤立标记2.0.0bin。如果您担心同事打字,git fetch <remote> --tags您实际上可以重新命名1.0.0bin。这将确保下次他们获取所有标签时,旧标签1.0.0bin将被取消引用并标记为后续垃圾收集(使用第 3 步)。当您尝试覆盖遥控器上的标签时,您必须-f像这样使用:git push -f <remote> <tagname>

Afterword

后记

  • OTABS doesn't touch your master or any other source code/development branches. The commit hashes, all of the history, and small size of these branches is unaffected. If you've already bloated your source code history with binary files you'll have to clean it up as a separate piece of work. This scriptmight be useful.

  • Confirmed to work on Windows with git-bash.

  • It is a good idea to apply a set of standard tricsto make storage of binary files more efficient. Frequent running of git gc(without any additional arguments) makes git optimise underlying storage of your files by using binary deltas. However, if your files are unlikely to stay similar from commit to commit you can switch off binary deltas altogether. Additionally, because it makes no sense to compress already compressed or encrypted files, like .zip, .jpg or .crypt, git allows you to switch off compression of the underlying storage. Unfortunately it's an all-or-nothing setting affecting your source code as well.

  • You might want to script up parts of OTABS to allow for quicker usage. In particular, scripting steps 2-3 from Completely Deleting Binary Filesinto an updategit hook could give a compelling but perhaps dangerous semantics to git fetch ("fetch and delete everything that is out of date").

  • You might want to skip the step 4 of Completely Deleting Binary Filesto keep a full history of all binary changes on the remote at the cost of the central repository bloat. Local repositories will stay lean over time.

  • In Java world it is possible to combine this solution with maven --offlineto create a reproducible offline build stored entirely in your version control (it's easier with maven than with gradle). In Golang world it is feasible to build on this solution to manage your GOPATH instead of go get. In python world it is possible to combine this with virtualenv to produce a self-contained development environment without relying on PyPi servers for every build from scratch.

  • If your binary files change very often, like build artifacts, it might be a good idea to script a solution which stores 5 most recent versions of the artifacts in the orphan tags monday_bin, tuesday_bin, ..., friday_bin, and also an orphan tag for each release 1.7.8bin2.0.0bin, etc. You can rotate the weekday_binand delete old binaries daily. This way you get the best of two worlds: you keep the entirehistory of your source code but only the relevanthistory of your binary dependencies. It is also very easy to get the binary files for a given tag withoutgetting entire source code with all its history: git init && git remote add <name> <url> && git fetch <name> <tag>should do it for you.

  • OTABS 不会触及您的主人或任何其他源代码/开发分支。这些分支的提交哈希、所有历史记录和小尺寸不受影响。如果您已经用二进制文件膨胀了源代码历史记录,则必须将其作为单独的工作进行清理。这个脚本可能有用。

  • 确认使用 git-bash 在 Windows 上工作。

  • 应用一组标准技巧来提高二进制文件的存储效率是一个好主意。频繁运行git gc(没有任何附加参数)使 git 通过使用二进制增量来优化文件的底层存储。但是,如果您的文件在每次提交之间不太可能保持相似,您可以完全关闭二进制增量。此外,因为压缩已经压缩或加密的文件(如 .zip、.jpg 或 .crypt)没有意义,所以 git 允许您关闭底层存储的压缩。不幸的是,它也是一个影响源代码的全有或全无设置。

  • 您可能希望编写部分 OTABS 的脚本,以便更快地使用。特别是,从完全删除二进制文件updategit 钩子的脚本步骤 2-3可以为 git fetch (“获取并删除所有过时的东西”)提供引人注目但可能危险的语义。

  • 您可能希望跳过完全删除二进制文件的第 4 步,以保留远程所有二进制更改的完整历史记录,但代价是中央存储库膨胀。随着时间的推移,本地存储库将保持精简。

  • 在 Java 世界中,可以将此解决方案与maven --offline创建完全存储在版本控制中的可重现离线构建相结合(使用 maven 比使用 gradle 更容易)。在 Golang 世界中,构建此解决方案来管理您的 GOPATH 而不是go get. 在 python 世界中,可以将它与 virtualenv 结合起来,以生成一个独立的开发环境,而无需依赖 PyPi 服务器从头开始构建。

  • 如果您的二进制文件经常更改,例如构建工件,那么编写一个解决方案可能是一个好主意,该解决方案将工件的 5 个最新版本存储在孤立标签monday_bin, tuesday_bin, ... friday_bin, 以及每个版本的孤立标签中1.7.8bin2.0.0bin等。您可以weekday_bin每天轮换和删除旧的二进制文件。通过这种方式,您可以两全其美:您保留源代码的整个历史记录,但只保留二进制依赖项的相关历史记录。获取给定标签的二进制文件也很容易,而无需获取包含所有历史记录的完整源代码:git init && git remote add <name> <url> && git fetch <name> <tag>应该为您做。

回答by Tony Diep

SVN seems to handle binary deltas more efficiently than Git.

SVN 似乎比 Git 更有效地处理二进制增量。

I had to decide on a versioning system for documentation (JPEG files, PDF files, and .odt files). I just tested adding a JPEG file and rotating it 90 degrees four times (to check effectiveness of binary deltas). Git's repository grew 400%. SVN's repository grew by only 11%.

我必须决定文档(JPEG 文件、PDF 文件和 .odt 文件)的版本控制系统。我刚刚测试了添加一个 JPEG 文件并将其旋转 90 度四次(以检查二进制增量的有效性)。Git 的存储库增长了 400%。SVN 的存储库仅增长了 11%。

So it looks like SVN is much more efficient with binary files.

所以看起来 SVN 对二进制文件的效率要高得多。

So my choice is Git for source code and SVN for binary files like documentation.

所以我的选择是 Git 源代码和 SVN 二进制文件,如文档。

回答by Josh Habdas

I am looking for opinions of how to handle large binary files on which my source code (web application) is dependent. What are your experiences/thoughts regarding this?

我正在寻找有关如何处理我的源代码(Web 应用程序)所依赖的大型二进制文件的意见。您对此有何经验/想法?

I personally have run into synchronisation failures with Gitwith some of my cloud hosts once my web applications binary data notched above the 3 GB mark. I considered BFT Repo Cleanerat the time, but it felt like a hack. Since then I've begun to just keep files outside of Git purview, instead leveraging purpose-built toolssuch as Amazon S3 for managing files, versioning and back-up.

一旦我的 Web 应用程序二进制数据超过 3 GB 标记,我个人就遇到了使用 Git与我的一些云主机同步失败的情况。我当时考虑过BFT Repo Cleaner,但感觉就像一个黑客。从那时起,我开始将文件保留在 Git 权限之外,而是利用Amazon S3 等专用工具来管理文件、版本控制和备份。

Does anybody have experience with multiple Git repositories and managing them in one project?

有没有人有使用多个 Git 存储库并在一个项目中管理它们的经验?

Yes. Hugo themesare primarily managed this way. It's a little kudgy, but it gets the job done.

是的。Hugo 主题主要以这种方式管理。这有点笨拙,但它完成了工作。



My suggestion is to choose the right tool for the job. If it's for a company and you're managing your codeline on GitHub pay the money and use Git-LFS. Otherwise you could explore more creative options such as decentralized, encrypted file storage using blockchain.

我的建议是为工作选择合适的工具。如果它是为一家公司准备的,并且您在 GitHub 上管理您的代码行,请支付费用并使用 Git-LFS。否则,您可以探索更多创意选项,例如使用区块链的分散式加密文件存储

Additional options to consider include Minioand s3cmd.

要考虑的其他选项包括Minios3cmd