如何从浅克隆中有效地 git fetch

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/19352894/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-10 17:08:17  来源:igfitidea点击:

How to git fetch efficiently from a shallow clone

gitgithubshallow-clone

提问by hendry

We use git to distribute an operating system and keep it upto date. We can't distribute the full repository since it's too large (>2GB), so we have been using shallow clones (~300M). However recently when fetching from a shallow clone, it's now inefficiently fetches the entire >2GB repository. This is an untenable waste of bandwidth for deployments.

我们使用git 分发操作系统并使其保持最新。我们无法分发完整的存储库,因为它太大(> 2GB),所以我们一直在使用浅克隆(~300M)。然而,最近从浅克隆中获取时,它现在获取整个 >2GB 存储库的效率低下。这对于部署来说是一种站不住脚的带宽浪费。

The git documentation says you cannot fetch from a shallow repository, though that's strictly not true. Are there any workarounds to make a git clone --depth 1able to fetch just what's changed from it? Or some other strategy to keep the distribution size as small as possiblewhilst having all the bits git needs to do an update?

git 文档说你不能从一个浅存储库中获取,尽管这完全不是真的。是否有任何解决方法git clone --depth 1可以从中获取更改的内容?或者其他一些策略来保持分布大小尽可能小,同时让 git 需要进行更新的所有位?

I have unsuccessfully tried cloning from --depth 20to see if it will upgrade more efficiently, that didn't work. I did also look into http://git-scm.com/docs/git-bundle, but that seems to create huge bundles.

我尝试从 克隆 失败--depth 20,看看它是否会更有效地升级,但没有成功。我也查看了http://git-scm.com/docs/git-bundle,但这似乎创建了巨大的包。

采纳答案by jthill

--depthis a git fetchoption. I see the doc doesn't really highlight that git clonedoes a fetch.

--depth是一种git fetch选择。我看到文档并没有真正突出显示git clone进行提取。

When you fetch, the two repos swap info on who has what by starting from the remote's heads and searching backward for the most recent shared commit in the fetched refs' histories, then filling in all the missing objects to complete just the new commits between the most recent shared commits and the newly fetched ones.

当您获取时,两个存储库交换关于谁拥有什么的信息,方法是从远程的头部开始并在获取的引用历史中向后搜索最近的共享提交,然后填充所有丢失的对象以仅完成两个对象之间的新提交最近共享的提交和新获取的提交。

A --depth=1fetch just gets the branch tips and no prior history. Further fetches of those histories will fetch everything new by the above procedure, but if the previously-fetched commits aren't in the newly fetched history, fetch will retrieve all of it -- unless you limit the fetch with --depth.

--depth=1取刚刚得到的枝梢,无病史。进一步获取这些历史记录将通过上述过程获取所有新内容,但如果先前获取的提交不在新获取的历史记录中,则 fetch 将检索所有这些 - 除非您使用--depth.

Your client did a depth=1 fetch from one repo and switched urls to a different repo. At least one long ancestry path in this new repo's refs apparently shares no commits with anything currently in your repo. That might be worth investigating, but either way unless there's some particular reason, your clients can just do every fetch --depth=1.

您的客户从一个 repo 中执行了 depth=1 fetch 并将 url 切换到另一个 repo。在这个新的 repo 的 refs 中,至少有一个很长的祖先路径显然与你的 repo 中当前的任何内容都没有共享。这可能值得研究,但无论哪种方式,除非有某些特殊原因,否则您的客户可以执行每次 fetch --depth=1

回答by Waterlink

Just did g clone github.com:torvalds/linuxand it took so much time, so I just skipped it by CTRL+C.

刚刚做了g clone github.com:torvalds/linux,花了很多时间,所以我只是跳过它CTRL+C

Then did g clone github.com:torvalds/linux --depth 1and it did cloned quite fast. And I have only one commit in git log.

然后做了g clone github.com:torvalds/linux --depth 1,它确实克隆得很快。而且我在git log.

So clone --depth 1should work. If you need to update existing repository, you should use git fetch origin remoteBranch:localBranch --depth 1. It works too, it fetches only one commit.

所以clone --depth 1应该工作。如果您需要更新现有存储库,则应使用git fetch origin remoteBranch:localBranch --depth 1. 它也有效,它只获取一次提交。

Summing up:

加起来:

Initial clone:

初始克隆:

git clone git_url --depth 1

Code update

代码更新

git fetch origin remoteBranch:localBranch --depth 1

回答by VonC

Note that Git 1.9/2.0 (Q1 2014) could be more efficient in fetching for a shallow clone.
See commit 82fba2b, from Nguy?n Thái Ng?c Duy (pclouds):

请注意,Git 1.9/2.0(2014 年第一季度)在获取浅克隆时可能更有效。
参见提交 82fba2b,来自Nguy?n Thái Ng?c Duy ( pclouds)

Now that git supports data transfer from or to a shallow clone, these limitations are not true anymore.

现在 git 支持从浅层克隆或向浅层克隆传输数据,这些限制不再成立。

All the details are in "shallow.c: the 8 steps to select new commits for .git/shallow".

所有详细信息都在“ shallow.c:为.git/shallow选择新提交的 8 个步骤中

You can see the consequence in commits like 0d7d285, f2c681c, and c29a7b8which support clone, send-pack /receive-pack with/from shallow clones.
smart-http now supports shallow fetch/clone too.
You can even clone form a shallow repo.

您可以在诸如0d7d285f2c681cc29a7b8 之类的提交中看到结果,这些提交支持克隆、使用/来自浅层克隆的发送包/接收包。
smart-http 现在也支持浅取/克隆
甚至可以克隆形成一个浅的 repo

Update 2015: git 2.5+ (Q2 2015) will even allow for a single commit fetch! See "Pull a specific commit from a remote git repository".

2015 年更新:git 2.5+(2015 年第二季度)甚至允许单次提交获取!请参阅“从远程 git 存储库中提取特定提交”。

Update 2016 (Oct.): git 2.11+ (Q4 2016) allows for fetching:

2016 年更新(10 月):git 2.11+(2016 年第四季度)允许获取:

回答by Martin Tapp

If you can select a specific branch, it can be even faster. Here's an example using Spark master branch and latest tag:

如果你可以选择一个特定的分支,它会更快。这是一个使用 Spark master 分支和 latest 标签的示例:

Initial clone

初始克隆

git clone [email protected]:apache/spark.git --branch master --single-branch --depth 1

Update to specific tag

更新到特定标签

git fetch --depth 1 origin tags/v1.6.0

It becomes very fast to switch tags/branch this way.

以这种方式切换标签/分支变得非常快。

回答by Rajish

I don't know if it suites your set-up but what I use is to have ha full clone of a repo in a separate directory. Then I do shallow clone from the remote repository with reference to the local one.

我不知道它是否适合您的设置,但我使用的是在单独的目录中拥有完整的 repo 克隆。然后我从远程存储库参考本地存储库进行浅克隆。

git clone --depth 1 --reference /path/to/local/clone [email protected]/group/repo.git 

That way only the differences with the reference repository and remote are actually fetched. To make it even quicker you can use the --sharedoption, but be sure to read about the restrictions in the gitdocumentation (it can be dangerous).

这样,实际上只获取了与参考存储库和远程存储库之间的差异。为了使其更快,您可以使用该--shared选项,但请务必阅读git文档中的限制(这可能很危险)。

Also I found out that in some circumstances when the remote has changed a lot, the clone starts fetching too much data. It is good to break it then and update the reference repo (which strangely takes much less bandwidth than it took in the first place.) And then start the clone again.

我还发现,在某些情况下,当遥控器发生很大变化时,克隆开始获取太多数据。最好打破它然后更新参考存储库(奇怪的是,它占用的带宽比一开始要少得多。)然后再次启动克隆。