如何下载大型 Git 存储库？

Question

提问by Sebastian Gray

I have a GIT repository on BitBucket which is more than 4GB.

我在 BitBucket 上有一个超过 4GB 的 GIT 存储库。

I can't clone the repository using the normal GIT command as it fails (looks like it's working for a long time but then rolls back).
I also can't download the repository as a zip from the BitBucket interface as:

我无法使用普通的 GIT 命令克隆存储库，因为它失败了（看起来它工作了很长时间但随后回滚）。
我也无法从 BitBucket 界面以 zip 格式下载存储库，因为：

Feature unavailable This repository is too large for us to generate a download.

Is there any way to download a GIT repository incrementally?

有没有办法增量下载GIT存储库？

Answer 1

采纳答案by Sebastian Gray

I got it to work by using this method fatal: early EOF fatal: index-pack failed

我通过使用这种方法来让它工作致命：早期 EOF 致命：索引包失败

But only after I setup SSL - this method still didn't work over HTTP.

但只有在我设置 SSL 之后 - 这种方法仍然无法通过 HTTP 工作。

The support at BitBucket was really helpful and pointed me in this direction.

BitBucket 的支持真的很有帮助，并为我指明了这个方向。

Answer 2

回答by Puddler

If you don't need to pull the whole history you could specify the number of revisions to clone

如果您不需要提取整个历史记录，您可以指定要克隆的修订数量

git clone <repo_url> --depth=1

Of course this might not help if you have a particularly large file in your repository

当然，如果您的存储库中有一个特别大的文件，这可能无济于事

Answer 3

回答by Mateusz S?czkowski

For me, helped perfectly, like is described in this answer: https://stackoverflow.com/a/22317479/6332374, but with one little improvement, because of big repo:

对我来说，帮助完美，就像在这个答案中描述的一样：https: //stackoverflow.com/a/22317479/6332374，但由于大回购，有一点改进：

At first:

首先：

git config --global core.compression 0

then, clone just a part of your repo:

然后，克隆你的回购的一部分：

git clone --depth 1 <repo_URI>

and now "the rest"

现在“剩下的”

git fetch --unshallow

but here is the trick.: When you have a big repo sometimes you must perform that step multiple times. So... again,

但这里有诀窍。：当你有一个很大的回购时，有时你必须多次执行该步骤。所以……再说一次，

git fetch --unshallow

and so on.

等等。

Try multiple times. Probably you will see, that each time you perform 'unshallow' you get more and more objects before the error.

尝试多次。您可能会看到，每次执行“非浅层”时，您都会在出错前获得越来越多的对象。

And at the end, just to be sure.

最后，只是为了确定。

git pull --all

Answer 4

回答by JerryGoyal

1)you can initially download the single branch having only the latest commit revision (depth=1), this will significantly reduce the size of the repo to download and still let you work on the code base:

1）您可以最初下载只有最新提交修订版（深度= 1）的单个分支，这将显着减少要下载的存储库的大小，并且仍然可以让您在代码库上工作：

git clone --depth <Number> <repository> --branch <branch name> --single-branch

example:
git clone --depth 1 https://github.com/dundermifflin/dwightsecrets.git --branch scranton --single-branch

例子：
git clone --depth 1 https://github.com/dundermifflin/dwightsecrets.git --branch scranton --single-branch

2)later you can get all the commits (after this your repo will be in the same state as after a git clone):

2）稍后您可以获得所有提交（在此之后，您的回购将处于与 git clone 后相同的状态）：

git fetch --unshallow

or if it's still too much, get only last 25 commits:

或者如果仍然太多，只获取最后 25 次提交：

git fetch --depth=25

Other way:git cloneis not resumable but you can first git cloneon a third party server and then download the complete repo over http/ftp which is actually resumable.

其他方式：git clone不可恢复，但您可以先git clone在第三方服务器上，然后通过 http/ftp 下载完整的 repo，这实际上是可恢复的。

Answer 5

回答by James Jones

One potential technique is just to clone a single branch. You can then pull in more later. Do git clone [url_of_remote] --branch [branch_name] --single-branch.

一种潜在的技术是克隆单个分支。然后，您可以稍后再拉入更多。做git clone [url_of_remote] --branch [branch_name] --single-branch。

Large repositories seem to be a major weakness with git. You can read about that at http://www.sitepoint.com/managing-huge-repositories-with-git/. This article mentions a git extension called git-annex that can help with large files. Check it out at https://git-annex.branchable.com/. It helps by allowing git to manage files without checking the files into git. Disclaimer, I've never tried it myself.

大型存储库似乎是 git 的主要弱点。您可以在http://www.sitepoint.com/managing-huge-repositories-with-git/阅读有关内容。这篇文章提到了一个名为 git-annex 的 git 扩展，它可以帮助处理大文件。在https://git-annex.branchable.com/ 上查看。它通过允许 git 管理文件而不将文件检入 git 来提供帮助。免责声明，我自己从未尝试过。

Some of the solutions at How do I clone a large Git repository on an unreliable connection?also may help.

如何在不可靠的连接上克隆大型 Git 存储库中的一些解决方案？也可能有帮助。

EDIT: Since you just want the files you may be able to try git archive. You'd use syntax something like

编辑：由于您只需要文件，您可以尝试git archive。你会使用类似的语法

git archive --remote=ssh://[email protected]/username/reponame.git --format=tar --output="file.tar" master

I tried to test on a repo at my AWS Codecommit account but it doesn't seem to allow it. Someone on BitBucket may be able to test. Note that on Windows you'd want to use zip rather than tar, and this all has to be done over an ssh connection not https.

我试图在我的 AWS Codecommit 账户上测试一个存储库，但它似乎不允许。BitBucket 上的某个人可能能够进行测试。请注意，在 Windows 上，您希望使用 zip 而不是 tar，这一切都必须通过 ssh 连接而不是 https 来完成。

Read more about git archiveat http://git-scm.com/docs/git-archive

git archive在http://git-scm.com/docs/git-archive阅读更多信息

Answer 6

回答by VonC

BitBucket should have a way to build an archive even for large repo with Git 2.13.x/2.14 (Q3 2017)

BitBucket 应该有一种方法可以使用 Git 2.13.x/2.14（2017 年第三季度）为大型存储库构建存档

See commit 867e40f(30 Apr 2017), commit ebdfa29(27 Apr 2017), commit 4cdf3f9, commit af95749, commit 3c78fd8, commit c061a14, and commit 758c1f9, by Rene Scharfe.
^{(Merged by Junio C Hamano -- gitster--in commit f085834, 16 May 2017)}

见提交867e40f（二〇一七年四月三十〇日），提交ebdfa29（2017年4月27日），提交4cdf3f9，提交af95749，提交3c78fd8，提交c061a14，并提交758c1f9，由勒内Scharfe。
^{（由Junio C gitsterHamano合并-- --在提交 f085834 中，2017 年 5 月 16 日）}

archive-zip: support files bigger than 4GB
Write a zip64extended information extra field for big files as part of their local headers and as part of their central directory headers.
Also write a zip64version of the data descriptor in that case.
If we're streaming then we don't know the compressed size at the time we write the header. Deflate can end up making a file bigger instead of smaller if we're unlucky.
Write a local zip64header already for files with a size of 2GB or more in this case to be on the safe side.
Both sizes need to be included in the local zip64header, but the extra field for the directory must only contain 64-bit equivalents for 32-bit values of 0xffffffff.

archive-zip: 支持大于 4GB 的文件
zip64为大文件编写一个扩展信息额外字段，作为其本地头文件的一部分和中央目录头文件的一部分。在这种情况下
还要编写zip64数据描述符的一个版本。
如果我们正在流式传输，那么我们在写入标头时不知道压缩大小。如果我们不走运，放气最终会使文件变大而不是变小。在这种情况下，为了安全起见，已经为大小为 2GB 或更大的文件
编写了本地zip64标头。
两种大小都需要包含在本地zip64标头中，但目录的额外字段必须仅包含 32 位值的 64 位等效项0xffffffff。

Answer 7

回答by ramwin

You can only clone the first commit and then the second commit...etc. It will be easier to pull if the difference between two commits is not very large. You can see more details from this answer.

您只能克隆第一个提交，然后再克隆第二个提交……等等。如果两次提交之间的差异不是很大，则更容易拉取。您可以从此答案中看到更多详细信息。

如何下载大型 Git 存储库？

提问by Sebastian Gray

采纳答案by Sebastian Gray

回答by Puddler

回答by Mateusz S?czkowski

回答by JerryGoyal

回答by James Jones

回答by VonC

`archive-zip`: support files bigger than 4GB

`archive-zip`: 支持大于 4GB 的文件

回答by ramwin

相关推荐

最近更新

标签

如何下载大型 Git 存储库？

提问by Sebastian Gray

采纳答案by Sebastian Gray

回答by Puddler

回答by Mateusz S?czkowski

回答by JerryGoyal

回答by James Jones

回答by VonC

archive-zip: support files bigger than 4GB

archive-zip: 支持大于 4GB 的文件

回答by ramwin

相关推荐

如何重新初始化我的 git repo？

git 'rebase master' 和 'rebase --onto master' 之间的区别来自从 master 分支派生的分支

git 包 org.apache.hadoop.conf 不存在

如何从 git 中排除打字稿编译文件

相关推荐

最近更新

标签

`archive-zip`: support files bigger than 4GB

`archive-zip`: 支持大于 4GB 的文件