通过快速网络连接克隆 git 存储库的最快方法是什么？

Question

提问by Thorbj?rn Ravn Andersen

I have a situation with a relatively large git repository located on an elderly, slow host on my local network where it takes quite a while to do the initial clone.

我有一个相对较大的 git 存储库的情况，位于我本地网络上的一个老旧、缓慢的主机上，在那里进行初始克隆需要很长时间。

ravn@bamboo:~/git$ git clone gitosis@gitbox:git00
Initialized empty Git repository in /home/ravn/git/git00/.git/
remote: Counting objects: 89973, done.
remote: Compressing objects: 100% (26745/26745), done.
remote: Total 89973 (delta 50970), reused 85013 (delta 47798)
Receiving objects: 100% (89973/89973), 349.86 MiB | 2.25 MiB/s, done.
Resolving deltas: 100% (50970/50970), done.
Checking out files: 100% (11722/11722), done.
ravn@bamboo:~/git$

There is no git specific configuration changes in gitosis.

gitosis 中没有 git 特定的配置更改。

Is there any way of speeding up the receiving bit up to what the network is capable of?

有没有办法将接收位加速到网络的能力？

EDIT: I need the new repositories to be properly connected with the upstream repository. To my understanding this require git to do the cloning, and thus raw bit copying outside of git will not work.

编辑：我需要新的存储库与上游存储库正确连接。据我了解，这需要 git 进行克隆，因此在 git 之外复制原始位将不起作用。

Answer 1

采纳答案by Thorbj?rn Ravn Andersen

After realizing that the upper limit to the transfer speed of data, is the ssh connection which is established "outside" of git I did some experiments, and found that the upper limit of using pcsp (Putty scp) was 3,0 MB/s as the blowfish encryption scheme was properly chosen. A control experiment with raw ftp showed that the transfer speed was 3.1 MB/s, so it indicated that this was the upper bound of the network.

在意识到数据传输速度的上限是在git“外部”建立的ssh连接后我做了一些实验，发现使用pcsp（Putty scp）的上限是3,0 MB/s因为正确选择了河豚加密方案。使用原始 ftp 进行的控制实验表明，传输速度为 3.1 MB/s，因此表明这是网络的上限。

This runs inside a vmware hypervisor, and as the process doing network I/O utilized almost 100% cpu it indicated that the bottleneck was the Ubuntu network card driver. I then found that even though vmware tools were installed, for some reason the kernel still used the vlance driver (emulating a 10 MBps network card with IRQ's and all) instead of the vmxnet driver (which speaks directly to the hypervisor). This now awaits a service window to be changed.

这在 vmware 管理程序中运行，并且由于执行网络 I/O 的进程使用了几乎 100% 的 cpu，这表明瓶颈是 Ubuntu 网卡驱动程序。然后我发现，即使安装了 vmware 工具，出于某种原因，内核仍然使用 vlance 驱动程序（用 IRQ 模拟 10 MBps 网卡）而不是 vmxnet 驱动程序（直接与虚拟机管理程序对话）。这现在等待更改服务窗口。

In other words, the problem was not with git but the underlying "hardware".

换句话说，问题不在于 git，而在于底层的“硬件”。

Answer 2

回答by sehe

PS. Fair warning:
gitis generally considered blazingly fast. You should try cloning a full repo from darcs, bazaar, hg (god forbid: TFS or subversion...). Also, if you routinely clone full repos from scratch, you'd be doing something wrong anyway. You can always just git remote updateand get incremental changes.
For various other ways to keep fullrepos in synch see, e.g.
"fetch --all" in a git bare repository doesn't synchronize local branches to the remote ones
How to update a git clone --mirror?
(The contain links to other relevant SO posts)

附注。公平警告：
git通常被认为是极快的。您应该尝试从 darcs、bazaar、hg（上帝保佑：TFS 或颠覆......）克隆一个完整的回购。此外，如果您经常从头开始克隆完整的存储库，那么无论如何您都会做错事。您始终可以只git remote update获得增量更改。
有关保持完整回购同步的各种其他方法，请参见，例如
git 裸存储库中的“fetch --all”不会将本地分支同步到远程分支
如何更新 git clone --mirror？
（包含指向其他相关 SO 帖子的链接）

Dumb copy

哑拷贝

As mentioned you could just copy a repository with 'dumb' file transfer.

如前所述，您可以使用“哑”文件传输复制存储库。

This will certainly not waste time compressing, repacking, deltifying and/or filtering.

这当然不会浪费时间压缩、重新打包、删除和/或过滤。

Plus, you will get

另外，你会得到

hooks
config (remotes, push branches, settings (whitespace, merge, aliases, user details etc.)
stashes _{^{(see Can I fetch a stash from a remote repo into a local branch?also)}}
rerere cache
reflogs
backups (from filter-branch, e.g.) and various other things (intermediate state from rebase, bisect etc.)

钩子
配置（遥控器、推送分支、设置（空格、合并、别名、用户详细信息等）
stashes _{^{（请参阅我可以从远程仓库中获取一个 stash 到本地分支吗？）}}
重新缓存
引用
备份（例如来自 filter-branch）和其他各种东西（来自 rebase、bisect 等的中间状态）

This may or may notbe what you require, but it is nice to be aware of the fact

这可能是也可能不是您所需要的，但很高兴知道这一事实

Bundle

捆

Git clone by default optimizes for bandwidth. Since git clone, by default, does not mirrorall branches (see --mirror) it would not make sense to just dump the pack-files as-is (because that will send possibly way more than required).

Git clone 默认优化带宽。由于默认情况下 git clone 不会镜像所有分支（请参阅参考资料--mirror），因此按原样转储包文件是没有意义的（因为这可能会发送超出所需的数量）。

When distributing to a truly bignumber of clients, consider using bundles.

当分配到一个真正的大的客户端，可以考虑使用捆绑。

If you want a fast clone without the server-side cost, the git wayis bundle create. You can now distribute the bundle, without the server even being involved. If you mean that bundle... --allincludes more than simple git clone, consider e.g. bundle ... masterto reduce the volume.

如果你想要一个没有服务器端成本的快速克隆，git 方法是bundle create. 您现在可以分发包，甚至不需要服务器参与。如果您的意思是bundle... --all包括不仅仅是 simple git clone，请考虑例如bundle ... master减少音量。

git bundle create snapshot.bundle --all # (or mention specific ref names instead of --all)

and distribute the snapshot bundle instead. That's the best of both worlds, while of course you won't get the items from the bullet list above. On the receiving end, just

并分发快照包。这是两全其美，当然你不会从上面的项目符号列表中获得项目。在接收端，只需

git clone snapshot.bundle myclonedir/

Compression configs

压缩配置

You can look at lowering server load by reducing/removing compression. Have a look at these config settings (I assume pack.compressionmay help you lower the server load)

您可以通过减少/删除压缩来降低服务器负载。看看这些配置设置（我想pack.compression可能会帮助你降低服务器负载）

^{core.compression}
An integer -1..9, indicating a default compression level. -1 is the zlib default. 0 means no compression, and 1..9 are various speed/size tradeoffs, 9 being slowest. If set, this provides a default to other compression variables, such as core.loosecompression and pack.compression.
^{core.loosecompression}
An integer -1..9, indicating the compression level for objects that are not in a pack file. -1 is the zlib default. 0 means no compression, and 1..9 are various speed/size tradeoffs, 9 being slowest. If not set, defaults to core.compression. If that is not set, defaults to 1 (best speed).
^{pack.compression}
An integer -1..9, indicating the compression level for objects in a pack file. -1 is the zlib default. 0 means no compression, and 1..9 are various speed/size tradeoffs, 9 being slowest. If not set, defaults to core.compression. If that is not set, defaults to -1, the zlib default, which is "a default compromise between speed and compression (currently equivalent to level 6)."
Note that changing the compression level will not automatically recompress all existing objects. You can force recompression by passing the -F option to git-repack(1).

^核心压缩
整数 -1..9，表示默认压缩级别。-1 是 zlib 默认值。0 表示不压缩，1..9 是各种速度/大小的权衡，9 是最慢的。如果设置，这将为其他压缩变量提供默认值，例如 core.loosecompression 和 pack.compression。
^{核心松散压缩}
一个整数 -1..9，表示不在包文件中的对象的压缩级别。-1 是 zlib 默认值。0 表示不压缩，1..9 是各种速度/大小的权衡，9 是最慢的。如果未设置，则默认为 core.compression。如果未设置，则默认为 1（最佳速度）。
^打包压缩
整数 -1..9，表示包文件中对象的压缩级别。-1 是 zlib 默认值。0 表示不压缩，1..9 是各种速度/大小的权衡，9 是最慢的。如果未设置，则默认为 core.compression。如果未设置，则默认为 -1，即 zlib 默认值，这是“速度和压缩之间的默认折衷（当前相当于级别 6）”。
请注意，更改压缩级别不会自动重新压缩所有现有对象。您可以通过将 -F 选项传递给 git-repack(1) 来强制重新压缩。

Given ample network bandwidth, this willin fact result in faster clones. Don't forget about git-repack -Fwhen you decide to benchmark that!

鉴于充裕的网络带宽，这将在事实上导致了更快的克隆。不要忘记git-repack -F何时决定对其进行基准测试！

Answer 3

回答by northtree

Use the depth to create a shallow clone.

使用深度创建一个浅克隆。

git clone --depth 1 <repository>

Answer 4

回答by VonC

The git clone --depth=1 ...suggested in 2014will become faster in Q2 2019 with Git 2.22.
That is because, during an initial "git clone --depth=..." partial clone, it is pointless to spend cycles for a large portion of the connectivity check that enumerates and skips promisor objects (which by definition is all objects fetched from the other side).
This has been optimized out.

git clone --depth=1 ...2014 年的建议将在 2019 年第二季度使用 Git 2.22 变得更快。
这是因为，在最初的“ git clone --depth=...”部分克隆期间，为枚举和跳过承诺对象（根据定义是从另一侧获取的所有对象）的大部分连接检查花费周期是没有意义的。
这已经被优化了。

clone: do faster object check for partial clones
For partial clones, doing a full connectivity check is wasteful; we skip promisor objects (which, for a partial clone, is all known objects), and enumerating them all to exclude them from the connectivity check can take a significant amount of time on large repos.
At most, we want to make sure that we get the objects referred to by any wanted refs.
For partial clones, just check that these objects were transferred.

clone：对部分克隆进行更快的对象检查
对于部分克隆，做一个完整的连接检查是浪费的；我们跳过承诺对象（对于部分克隆来说，它是所有已知对象），并且枚举它们以将它们从连接检查中排除在大型存储库上可能需要大量时间。
最多，我们想确保我们得到任何想要的引用所引用的对象。
对于部分克隆，只需检查这些对象是否已传输。

Result:

结果：

  Test                          dfa33a2^         dfa33a2
  -------------------------------------------------------------------------
  5600.2: clone without blobs   18.41(22.72+1.09)   6.83(11.65+0.50) -62.9%
  5600.3: checkout of result    1.82(3.24+0.26)     1.84(3.24+0.26) +1.1%

62% faster!

速度提高 62%！

With Git 2.26 (Q1 2020), an unneeded connectivity check is now disabled in a partial clone when fetching into it.

在 Git 2.26（2020 年第 1 季度）中，现在在获取部分克隆时禁用了不需要的连接检查。

See commit 2df1aa2, commit 5003377(12 Jan 2020) by Jonathan Tan (jhowtan).
^{(Merged by Junio C Hamano -- gitster--in commit 8fb3945, 14 Feb 2020)}

请参阅Jonathan Tan ( ) 的 commit 2df1aa2和commit 5003377（2020 年 1 月 12 日）。^（由^{Junio C}^Hamano^合并^--^--^在^{提交 8fb3945 中}^{，2020 年 2 月 14 日）}jhowtan
^gitster

connected: verify promisor-ness of partial clone
^{Signed-off-by: Jonathan Tan}
^{Reviewed-by: Jonathan Nieder}
Commit dfa33a298d("clone: do faster object check for partial clones", 2019-04-21, Git v2.22.0-rc0 -- merge) optimized the connectivity check done when cloning with --filterto check only the existence of objects directly pointed to by refs.
But this is not sufficient: they also need to be promisor objects.
Make this check more robust by instead checking that these objects are promisor objects, that is, they appear in a promisor pack.

connected: 验证部分克隆的承诺
^{签字人：Jonathan Tan}
^{审核人：Jonathan Nieder}
提交dfa33a298d（“ clone：对部分克隆进行更快的对象检查”，2019-04-21，Git v2.22.0-rc0 -- merge）优化了在克隆时完成的连接检查，--filter以仅检查 refs 直接指向的对象的存在.
但这还不够：它们还需要是承诺对象。
通过检查这些对象是否是承诺对象，即它们出现在承诺包中，使此检查更加健壮。

And:

和：

fetch: forgo full connectivity check if --filter
^{Signed-off-by: Jonathan Tan}
^{Reviewed-by: Jonathan Nieder}
If a filter is specified, we do not need a full connectivity check on the contents of the packfile we just fetched; we only need to check that the objects referenced are promisor objects.
This significantly speeds up fetches into repositories that have many promisor objects, because during the connectivity check, all promisor objects are enumerated (to mark them UNINTERESTING), and that takes a significant amount of time.

fetch：放弃完全连接检查是否 --filter
^{签字人：Jonathan Tan}
^{审核人：Jonathan Nieder}
如果指定了过滤器，我们不需要对刚刚获取的包文件的内容进行完整的连接检查；我们只需要检查引用的对象是否是承诺对象。
这显着加快了对具有许多承诺对象的存储库的提取速度，因为在连接检查期间，所有承诺对象都被枚举（以将它们标记为 UNINTERESTING），这需要大量时间。

And, still with Git 2.26 (Q1 2020), The object reachability bitmap machinery and the partial cloning machinery were not prepared to work well together, because some object-filtering criteria that partial clones use inherently rely on object traversal, but the bitmap machinery is an optimization to bypass that object traversal.

而且，在 Git 2.26（2020 年第一季度）中，对象可达性位图机制和部分克隆机制还没有准备好协同工作，因为部分克隆使用的一些对象过滤标准本质上依赖于对象遍历，但位图机制是绕过该对象遍历的优化。

There however are some cases where they can work together, and they were taught about them.

然而，在某些情况下，他们可以一起工作，并且他们被教导了这些。

See commit 20a5fd8(18 Feb 2020) by Junio C Hamano (gitster).
See commit 3ab3185, commit 84243da, commit 4f3bd56, commit cc4aa28, commit 2aaeb9a, commit 6663ae0, commit 4eb707e, commit ea047a8, commit 608d9c9, commit 55cb10f, commit 792f811, commit d90fe06(14 Feb 2020), and commit e03f928, commit acac50d, commit 551cf8b(13 Feb 2020) by Jeff King (peff).
^{(Merged by Junio C Hamano -- gitster--in commit 0df82d9, 02 Mar 2020)}

请参阅Junio C Hamano() 的commit 20a5fd8（2020 年 2 月 18 日）。见提交3ab3185，提交84243da，提交4f3bd56，提交cc4aa28，提交2aaeb9a，提交6663ae0，提交4eb707e，提交ea047a8，提交608d9c9，提交55cb10f，提交792f811，提交d90fe06（2020年2月14日），以及提交e03f928，提交acac50d，提交551cf8b（2020 年 2 月 13 日）作者：Jeff King ( )gitster
peff.
^{（由Junio C gitsterHamano合并-- --在0df82d9 提交中，2020 年 3 月 2 日）}

pack-bitmap: implement BLOB_LIMITfiltering
^{Signed-off-by: Jeff King}
Just as the previous commit implemented BLOB_NONE, we can support BLOB_LIMITfilters by looking at the sizes of any blobs in the result and unsetting their bits as appropriate.
This is slightly more expensive than BLOB_NONE,but still produces a noticeable speedup (these results are on git.git):
Test                                         HEAD~2            HEAD
------------------------------------------------------------------------------------
5310.9:  rev-list count with blob:none       1.80(1.77+0.02)   0.22(0.20+0.02) -87.8%
5310.10: rev-list count with blob:limit=1k   1.99(1.96+0.03)   0.29(0.25+0.03) -85.4%
The implementation is similar to the BLOB_NONEone, with the exception that we have to go object-by-object while walking the blob-type bitmap (since we can't mask out the matches, but must look up the size individually for each blob).
The trick with using ctz64()is taken from show_objects_for_type(), which likewise needs to find individual bits (but wants to quickly skip over big chunks without blobs).

pack-bitmap: 实现BLOB_LIMIT过滤
^{签字人：Jeff King}
就像之前实现BLOB_NONE的提交一样，我们可以BLOB_LIMIT通过查看结果中任何 blob 的大小并适当地取消设置它们的位来支持过滤器。
这比略贵，BLOB_NONE,但仍然产生明显的加速（这些结果在git.git 上）：
Test                                         HEAD~2            HEAD
------------------------------------------------------------------------------------
5310.9:  rev-list count with blob:none       1.80(1.77+0.02)   0.22(0.20+0.02) -87.8%
5310.10: rev-list count with blob:limit=1k   1.99(1.96+0.03)   0.29(0.25+0.03) -85.4%
实现与实现类似，不同之处BLOB_NONE在于我们必须在遍历 blob 类型位图时逐个对象（因为我们无法屏蔽匹配项，但必须单独查找每个 blob 的大小） .
using 的技巧ctz64()来自show_objects_for_type()，它同样需要找到单个位（但想要快速跳过没有 blob 的大块）。

Git 2.27 (Q2 2020) will simplify the commit ancestry connectedness check in a partial clone repository in which "promised" objects are assumed to be obtainable lazily on-demand from promisor remote repositories.

Git 2.27（2020 年第 2 季度）将简化部分克隆存储库中的提交祖先连接性检查，其中假定“承诺的”对象可以从承诺远程存储库按需延迟获取。

See commit 2b98478(20 Mar 2020) by Jonathan Tan (jhowtan).
^{(Merged by Junio C Hamano -- gitster--in commit 0c60105, 22 Apr 2020)}

请参阅Jonathan Tan ( ) 的 commit 2b98478(20 Mar 2020 )。^（由^{Junio C}^Hamano^合并^--^--^在^{commit 0c60105}^{，2020 年 4 月 22 日）}jhowtan
^gitster

connected: always use partial clone optimization
^{Signed-off-by: Jonathan Tan}
^{Reviewed-by: Josh Steadmon}
With 50033772d5("connected: verify promisor-ness of partial clone", 2020-01-30, Git v2.26.0-rc0 -- mergelisted in batch #5), the fast path (checking promisor packs) in check_connected()now passes a subset of the slow path (rev-list) - if all objects to be checked are found in promisor packs, both the fast path and the slow path will pass; otherwise, the fast path will definitely not pass.
This means that we can always attempt the fast path whenever we need to do the slow path.
The fast path is currently guarded by a flag; therefore, remove that flag.
Also, make the fast path fallback to the slow path - if the fast path fails, the failing OID and all remaining OIDs will be passed to rev-list.
The main user-visible benefit is the performance of fetch from a partial clone - specifically, the speedup of the connectivity check done before the fetch.
In particular, a no-op fetch into a partial clone on my computer was sped up from 7 seconds to 0.01 seconds. This is a complement to the work in 2df1aa239c("fetch: forgo full connectivity check if --filter", 2020-01-30, Git v2.26.0-rc0 -- mergelisted in batch #5), which is the child of the aforementioned 50033772d5. In that commit, the connectivity check afterthe fetch was sped up.
The addition of the fast path might cause performance reductions in these cases:
If a partial clone or a fetch into a partial clone fails, Git will fruitlessly run rev-list(it is expected that everything fetched would go into promisor packs, so if that didn't happen, it is most likely that rev-list will fail too).
Any connectivity checks done by receive-pack, in the (in my opinion, unlikely) event that a partial clone serves receive-pack.
I think that these cases are rare enough, and the performance reduction in this case minor enough (additional object DB access), that the benefit of avoiding a flag outweighs these.

connected: 始终使用部分克隆优化
^{签字人：Jonathan Tan}
^{评论人：Josh Steadmon}
使用50033772d5（“ connected：验证部分克隆的承诺”，2020 年 1 月 30 日，Git v2.26.0-rc0 --合并在第 5 批中列出），快速路径（检查承诺包）check_connected()现在通过慢路径（rev-list）——如果在promisor包中找到所有要检查的对象，则快路径和慢路径都将通过；否则，快速路径肯定不会通过。
这意味着每当我们需要做慢速路径时，我们总是可以尝试快速路径。
快速路径目前由标志保护；因此，删除该标志。
此外，让快速路径回退到慢速路径 - 如果快速路径失败，失败的 OID 和所有剩余的 OID 将传递给 rev-list。
主要的用户可见的好处是从部分克隆中获取的性能 - 特别是在获取之前完成的连接检查的加速。
特别是，对我计算机上部分克隆的无操作提取从 7 秒加快到 0.01 秒。这是对2df1aa239c 中工作的补充（“ fetch：forgofull connections check if --filter”, 2020-01-30, Git v2.26.0-rc0 -- merge列在第 5 批中），它是前面提到的50033772d5。在该提交中，提取后的连接检查加速。
在这些情况下，添加快速路径可能会导致性能下降：
如果部分克隆或获取部分克隆失败，Git 将毫无结果地运行rev-list（预计获取的所有内容都将进入承诺包，因此如果没有发生，则 rev-list 很可能也会失败） .
在（在我看来，不太可能）部分克隆服务于接收包的情况下，接收包完成的任何连接检查。
我认为这些情况很少见，而且这种情况下的性能下降足够小（额外的对象数据库访问），避免标志的好处超过了这些。

With Git 2.27 (Q2 2020), the object walk with object filter "--filter=tree:0" can now take advantage of the pack bitmap when available.

在 Git 2.27（2020 年第二季度）中，带有对象过滤器“ --filter=tree:0”的对象遍历现在可以在可用时利用包位图。

See commit 9639474, commit 5bf7f1e(04 May 2020) by Jeff King (peff).
See commit b0a8d48, commit 856e12c(04 May 2020) by Taylor Blau (ttaylorr).
^{(Merged by Junio C Hamano -- gitster--in commit 69ae8ff, 13 May 2020)}

请参阅Jeff King ( ) 的 commit 9639474和commit 5bf7f1e（2020 年 5 月 4 日）。请参阅Taylor Blau ( ) 的提交 b0a8d48和856e12c(2020 年 5 月 4 日)。^（由^{Junio C}^Hamano^合并^--^--^在^{提交 69ae8ff 中}^{，2020 年 5 月 13 日）}peff
ttaylorr
^gitster

pack-bitmap.c: support 'tree:0' filtering
^{Signed-off-by: Taylor Blau}
In the previous patch, we made it easy to define other filters that exclude all objects of a certain type. Use that in order to implement bitmap-level filtering for the '--filter=tree:<n>' filter when 'n' is equal to 0.
The general case is not helped by bitmaps, since for values of 'n > 0', the object filtering machinery requires a full-blown tree traversal in order to determine the depth of a given tree.
Caching this is non-obvious, too, since the same tree object can have a different depth depending on the context (e.g., a tree was moved up in the directory hierarchy between two commits).
But, the 'n = 0' case can be helped, and this patch does so.
Running p5310.11in this tree and on master with the kernel, we can see that this case is helped substantially:
Test                                  master              this tree
--------------------------------------------------------------------------------
5310.11: rev-list count with tree:0   10.68(10.39+0.27)   0.06(0.04+0.01) -99.4%

pack-bitmap.c: 支持 'tree:0' 过滤
^{签字人：Taylor Blau}
在上一个补丁中，我们可以轻松定义排除某种类型的所有对象的其他过滤器。--filter=tree:<n>当“ n”等于时，使用它来为“ ”过滤器实现位图级过滤0。
位图对一般情况没有帮助，因为对于 ' n > 0' 的值，对象过滤机制需要完整的树遍历以确定给定树的深度。
缓存这也是不明显的，因为相同的树对象可以根据上下文具有不同的深度（例如，树在两次提交之间的目录层次结构中向上移动）。
但是，“ n = 0” 情况可以提供帮助，而这个补丁就是这样做的。在这棵树上
运行p5310.11，并在带有内核的 master 上运行，我们可以看到这种情况有很大帮助：
Test                                  master              this tree
--------------------------------------------------------------------------------
5310.11: rev-list count with tree:0   10.68(10.39+0.27)   0.06(0.04+0.01) -99.4%

And:

和：

See commit 9639474, commit 5bf7f1e(04 May 2020) by Jeff King (peff).
See commit b0a8d48, commit 856e12c(04 May 2020) by Taylor Blau (ttaylorr).
^{(Merged by Junio C Hamano -- gitster--in commit 69ae8ff, 13 May 2020)}

请参阅Jeff King ( ) 的 commit 9639474和commit 5bf7f1e（2020 年 5 月 4 日）。请参阅Taylor Blau ( ) 的提交 b0a8d48和856e12c(2020 年 5 月 4 日)。^（由^{Junio C}^Hamano^合并^--^--^在^{提交 69ae8ff 中}^{，2020 年 5 月 13 日）}peff
ttaylorr
^gitster

pack-bitmap: pass object filter to fill-in traversal
^{Signed-off-by: Jeff King}
^{Signed-off-by: Taylor Blau}
Sometimes a bitmap traversal still has to walk some commits manually, because those commits aren't included in the bitmap packfile (e.g., due to a push or commit since the last full repack).
If we're given an object filter, we don't pass it down to this traversal.
It's not necessary for correctness because the bitmap code has its own filters to post-process the bitmap result (which it must, to filter out the objects that arementioned in the bitmapped packfile).
And with blob filters, there was no performance reason to pass along those filters, either. The fill-in traversal could omit them from the result, but it wouldn't save us any time to do so, since we'd still have to walk each tree entry to see if it's a blob or not.
But now that we support tree filters, there's opportunity for savings. A tree:depth=0filter means we can avoid accessing trees entirely, since we know we won't them (or any of the subtrees or blobs they point to).
The new test in p5310shows this off (the "partial bitmap" state is one where HEAD~100and its ancestors are all in a bitmapped pack, but HEAD~100..HEADare not).
Here are the results (run against linux.git):
Test                                                  HEAD^               HEAD
-------------------------------------------------------------------------------------------------
[...]
5310.16: rev-list with tree filter (partial bitmap)   0.19(0.17+0.02)     0.03(0.02+0.01) -84.2%
The absolute number of savings isn't huge, but keep in mind that we only omitted 100 first-parent links (in the version of linux.githere, that's 894 actual commits).
In a more pathological case, we might have a much larger proportion of non-bitmapped commits. I didn't bother creating such a case in the perf script because the setup is expensive, and this is plenty to show the savings as a percentage.

pack-bitmap: 通过对象过滤器进行填充遍历
^{签字人：Jeff King}
^{签字人：Taylor Blau}
有时位图遍历仍然需要手动遍历一些提交，因为这些提交不包含在位图打包文件中（例如，由于自上次完全重新打包以来的推送或提交）。
如果我们得到一个对象过滤器，我们不会将它传递给这个遍历。
正确性不是必需的，因为位图代码有自己的过滤器来对位图结果进行后处理（它必须过滤掉位图包文件中提到的对象）。
对于 blob 过滤器，也没有传递这些过滤器的性能原因。填充遍历可以从结果中省略它们，但它不会为我们节省任何时间，因为我们仍然必须遍历每个树条目以查看它是否是 blob。
但既然我们支持树过滤器，就有机会节省成本。一个tree:depth=0过滤装置，我们可以完全避免访问树，因为我们知道，我们不会把他们（或子树或斑点它们指向的）。
中的新测试p5310显示了这一点（“部分位图”状态是一种HEAD~100及其祖先都在位图包中的状态，但HEAD~100..HEAD不是）。
以下是结果（针对linux.git）：
Test                                                  HEAD^               HEAD
-------------------------------------------------------------------------------------------------
[...]
5310.16: rev-list with tree filter (partial bitmap)   0.19(0.17+0.02)     0.03(0.02+0.01) -84.2%
储蓄的绝对数量并不庞大，但请记住，我们只省略100第一父链接（在版本的linux.git在这里，这是894个实际提交）。
在更病态的情况下，我们可能有更大比例的非位图提交。我没有费心在 perf 脚本中创建这样的案例，因为设置很昂贵，这足以显示节省的百分比。

Answer 5

回答by dwa.kang

I'm bench marking git clone.

我正在基准测试 git clone。

It can be faster with --jobs options if the project include submodules ex:

如果项目包含子模块，则使用 --jobs 选项会更快：

git clone --recursive --shallow-submodules --depth 1 --branch "your tag or branch" --jobs 5 --  "your remote repo"

Answer 6

回答by idanzalz

From the log it seems you already finished the clone, if your problem is that you need to do this process multiple times on different machines, you can just copy the repository directory from one machine to the other. This way will preserve the relationship (remotes) between each copy and the repository you cloned from.

从日志看来你已经完成了克隆，如果你的问题是你需要在不同的机器上多次执行这个过程，你可以将存储库目录从一台机器复制到另一台机器。这种方式将保留每个副本和您从中克隆的存储库之间的关系（远程）。

通过快速网络连接克隆 git 存储库的最快方法是什么？

提问by Thorbj?rn Ravn Andersen

采纳答案by Thorbj?rn Ravn Andersen

回答by sehe

Dumb copy

哑拷贝

Bundle

捆

Compression configs

压缩配置

^{core.compression}

^{core.loosecompression}

^{pack.compression}

^核心压缩

^{核心松散压缩}

^打包压缩

回答by northtree

回答by VonC

`clone`: do faster object check for partial clones

`clone`：对部分克隆进行更快的对象检查

`connected`: verify promisor-ness of partial clone

`connected`: 验证部分克隆的承诺

`fetch`: forgo full connectivity check if `--filter`

`fetch`：放弃完全连接检查是否 `--filter`

`pack-bitmap`: implement `BLOB_LIMIT`filtering

`pack-bitmap`: 实现`BLOB_LIMIT`过滤

`connected`: always use partial clone optimization

`connected`: 始终使用部分克隆优化

`pack-bitmap.c`: support 'tree:0' filtering

`pack-bitmap.c`: 支持 'tree:0' 过滤

`pack-bitmap`: pass object filter to fill-in traversal

`pack-bitmap`: 通过对象过滤器进行填充遍历

回答by dwa.kang

回答by idanzalz

相关推荐

最近更新

标签

通过快速网络连接克隆 git 存储库的最快方法是什么？

提问by Thorbj?rn Ravn Andersen

采纳答案by Thorbj?rn Ravn Andersen

回答by sehe

Dumb copy

哑拷贝

Bundle

捆

Compression configs

压缩配置

core.compression

core.loosecompression

pack.compression

核心压缩

核心松散压缩

打包压缩

回答by northtree

回答by VonC

clone: do faster object check for partial clones

clone：对部分克隆进行更快的对象检查

connected: verify promisor-ness of partial clone

connected: 验证部分克隆的承诺

fetch: forgo full connectivity check if --filter

fetch：放弃完全连接检查是否 --filter

pack-bitmap: implement BLOB_LIMITfiltering

pack-bitmap: 实现BLOB_LIMIT过滤

connected: always use partial clone optimization

connected: 始终使用部分克隆优化

pack-bitmap.c: support 'tree:0' filtering

pack-bitmap.c: 支持 'tree:0' 过滤

pack-bitmap: pass object filter to fill-in traversal

pack-bitmap: 通过对象过滤器进行填充遍历

回答by dwa.kang

回答by idanzalz

相关推荐

带有修改和未跟踪内容的 git 子模块 - 为什么以及如何删除它？

git 如何执行“撤消挂起的更改”的 TFS 等效项

git 尽管存在 .gitignore 文件，但仍强制添加

git 在 Github 中向项目所有者添加另一个用户

相关推荐

最近更新

标签

^{core.compression}

^{core.loosecompression}

^{pack.compression}

^核心压缩

^{核心松散压缩}

^打包压缩

`clone`: do faster object check for partial clones

`clone`：对部分克隆进行更快的对象检查

`connected`: verify promisor-ness of partial clone

`connected`: 验证部分克隆的承诺

`fetch`: forgo full connectivity check if `--filter`

`fetch`：放弃完全连接检查是否 `--filter`

`pack-bitmap`: implement `BLOB_LIMIT`filtering

`pack-bitmap`: 实现`BLOB_LIMIT`过滤

`connected`: always use partial clone optimization

`connected`: 始终使用部分克隆优化

`pack-bitmap.c`: support 'tree:0' filtering

`pack-bitmap.c`: 支持 'tree:0' 过滤

`pack-bitmap`: pass object filter to fill-in traversal

`pack-bitmap`: 通过对象过滤器进行填充遍历