通过快速网络连接克隆 git 存储库的最快方法是什么?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/8180525/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
What is the fastest way to clone a git repository over a fast network connection?
提问by Thorbj?rn Ravn Andersen
I have a situation with a relatively large git repository located on an elderly, slow host on my local network where it takes quite a while to do the initial clone.
我有一个相对较大的 git 存储库的情况,位于我本地网络上的一个老旧、缓慢的主机上,在那里进行初始克隆需要很长时间。
ravn@bamboo:~/git$ git clone gitosis@gitbox:git00
Initialized empty Git repository in /home/ravn/git/git00/.git/
remote: Counting objects: 89973, done.
remote: Compressing objects: 100% (26745/26745), done.
remote: Total 89973 (delta 50970), reused 85013 (delta 47798)
Receiving objects: 100% (89973/89973), 349.86 MiB | 2.25 MiB/s, done.
Resolving deltas: 100% (50970/50970), done.
Checking out files: 100% (11722/11722), done.
ravn@bamboo:~/git$
There is no git specific configuration changes in gitosis.
gitosis 中没有 git 特定的配置更改。
Is there any way of speeding up the receiving bit up to what the network is capable of?
有没有办法将接收位加速到网络的能力?
EDIT: I need the new repositories to be properly connected with the upstream repository. To my understanding this require git to do the cloning, and thus raw bit copying outside of git will not work.
编辑:我需要新的存储库与上游存储库正确连接。据我了解,这需要 git 进行克隆,因此在 git 之外复制原始位将不起作用。
采纳答案by Thorbj?rn Ravn Andersen
After realizing that the upper limit to the transfer speed of data, is the ssh connection which is established "outside" of git I did some experiments, and found that the upper limit of using pcsp (Putty scp) was 3,0 MB/s as the blowfish encryption scheme was properly chosen. A control experiment with raw ftp showed that the transfer speed was 3.1 MB/s, so it indicated that this was the upper bound of the network.
在意识到数据传输速度的上限是在git“外部”建立的ssh连接后我做了一些实验,发现使用pcsp(Putty scp)的上限是3,0 MB/s因为正确选择了河豚加密方案。使用原始 ftp 进行的控制实验表明,传输速度为 3.1 MB/s,因此表明这是网络的上限。
This runs inside a vmware hypervisor, and as the process doing network I/O utilized almost 100% cpu it indicated that the bottleneck was the Ubuntu network card driver. I then found that even though vmware tools were installed, for some reason the kernel still used the vlance driver (emulating a 10 MBps network card with IRQ's and all) instead of the vmxnet driver (which speaks directly to the hypervisor). This now awaits a service window to be changed.
这在 vmware 管理程序中运行,并且由于执行网络 I/O 的进程使用了几乎 100% 的 cpu,这表明瓶颈是 Ubuntu 网卡驱动程序。然后我发现,即使安装了 vmware 工具,出于某种原因,内核仍然使用 vlance 驱动程序(用 IRQ 模拟 10 MBps 网卡)而不是 vmxnet 驱动程序(直接与虚拟机管理程序对话)。这现在等待更改服务窗口。
In other words, the problem was not with git but the underlying "hardware".
换句话说,问题不在于 git,而在于底层的“硬件”。
回答by sehe
PS. Fair warning:
git
is generally considered blazingly fast. You should try cloning a full repo from darcs, bazaar, hg (god forbid: TFS or subversion...). Also, if you routinely clone full repos from scratch, you'd be doing something wrong anyway. You can always justgit remote update
and get incremental changes.For various other ways to keep fullrepos in synch see, e.g.
- "fetch --all" in a git bare repository doesn't synchronize local branches to the remote ones
- How to update a git clone --mirror?
(The contain links to other relevant SO posts)
附注。公平警告:
git
通常被认为是极快的。您应该尝试从 darcs、bazaar、hg(上帝保佑:TFS 或颠覆......)克隆一个完整的回购。此外,如果您经常从头开始克隆完整的存储库,那么无论如何您都会做错事。您始终可以只git remote update
获得增量更改。有关保持完整回购同步的各种其他方法,请参见,例如
(包含指向其他相关 SO 帖子的链接)
Dumb copy
哑拷贝
As mentioned you could just copy a repository with 'dumb' file transfer.
如前所述,您可以使用“哑”文件传输复制存储库。
This will certainly not waste time compressing, repacking, deltifying and/or filtering.
这当然不会浪费时间压缩、重新打包、删除和/或过滤。
Plus, you will get
另外,你会得到
- hooks
- config (remotes, push branches, settings (whitespace, merge, aliases, user details etc.)
- stashes (see Can I fetch a stash from a remote repo into a local branch?also)
- rerere cache
- reflogs
- backups (from filter-branch, e.g.) and various other things (intermediate state from rebase, bisect etc.)
- 钩子
- 配置(遥控器、推送分支、设置(空格、合并、别名、用户详细信息等)
- stashes (请参阅我可以从远程仓库中获取一个 stash 到本地分支吗?)
- 重新缓存
- 引用
- 备份(例如来自 filter-branch)和其他各种东西(来自 rebase、bisect 等的中间状态)
This may or may notbe what you require, but it is nice to be aware of the fact
这可能是也可能不是您所需要的,但很高兴知道这一事实
Bundle
捆
Git clone by default optimizes for bandwidth. Since git clone, by default, does not mirrorall branches (see --mirror
) it would not make sense to just dump the pack-files as-is (because that will send possibly way more than required).
Git clone 默认优化带宽。由于默认情况下 git clone 不会镜像所有分支(请参阅 参考资料--mirror
),因此按原样转储包文件是没有意义的(因为这可能会发送超出所需的数量)。
When distributing to a truly bignumber of clients, consider using bundles.
当分配到一个真正的大的客户端,可以考虑使用捆绑。
If you want a fast clone without the server-side cost, the git wayis bundle create
. You can now distribute the bundle, without the server even being involved. If you mean that bundle... --all
includes more than simple git clone
, consider e.g. bundle ... master
to reduce the volume.
如果你想要一个没有服务器端成本的快速克隆,git 方法是bundle create
. 您现在可以分发包,甚至不需要服务器参与。如果您的意思是bundle... --all
包括不仅仅是 simple git clone
,请考虑例如bundle ... master
减少音量。
git bundle create snapshot.bundle --all # (or mention specific ref names instead of --all)
and distribute the snapshot bundle instead. That's the best of both worlds, while of course you won't get the items from the bullet list above. On the receiving end, just
并分发快照包。这是两全其美,当然你不会从上面的项目符号列表中获得项目。在接收端,只需
git clone snapshot.bundle myclonedir/
Compression configs
压缩配置
You can look at lowering server load by reducing/removing compression.
Have a look at these config settings (I assume pack.compression
may help you lower the server load)
您可以通过减少/删除压缩来降低服务器负载。看看这些配置设置(我想pack.compression
可能会帮助你降低服务器负载)
core.compression
An integer -1..9, indicating a default compression level. -1 is the zlib default. 0 means no compression, and 1..9 are various speed/size tradeoffs, 9 being slowest. If set, this provides a default to other compression variables, such as core.loosecompression and pack.compression.
core.loosecompression
An integer -1..9, indicating the compression level for objects that are not in a pack file. -1 is the zlib default. 0 means no compression, and 1..9 are various speed/size tradeoffs, 9 being slowest. If not set, defaults to core.compression. If that is not set, defaults to 1 (best speed).
pack.compression
An integer -1..9, indicating the compression level for objects in a pack file. -1 is the zlib default. 0 means no compression, and 1..9 are various speed/size tradeoffs, 9 being slowest. If not set, defaults to core.compression. If that is not set, defaults to -1, the zlib default, which is "a default compromise between speed and compression (currently equivalent to level 6)."
Note that changing the compression level will not automatically recompress all existing objects. You can force recompression by passing the -F option to git-repack(1).
核心压缩
整数 -1..9,表示默认压缩级别。-1 是 zlib 默认值。0 表示不压缩,1..9 是各种速度/大小的权衡,9 是最慢的。如果设置,这将为其他压缩变量提供默认值,例如 core.loosecompression 和 pack.compression。
核心松散压缩
一个整数 -1..9,表示不在包文件中的对象的压缩级别。-1 是 zlib 默认值。0 表示不压缩,1..9 是各种速度/大小的权衡,9 是最慢的。如果未设置,则默认为 core.compression。如果未设置,则默认为 1(最佳速度)。
打包压缩
整数 -1..9,表示包文件中对象的压缩级别。-1 是 zlib 默认值。0 表示不压缩,1..9 是各种速度/大小的权衡,9 是最慢的。如果未设置,则默认为 core.compression。如果未设置,则默认为 -1,即 zlib 默认值,这是“速度和压缩之间的默认折衷(当前相当于级别 6)”。
请注意,更改压缩级别不会自动重新压缩所有现有对象。您可以通过将 -F 选项传递给 git-repack(1) 来强制重新压缩。
Given ample network bandwidth, this willin fact result in faster clones. Don't forget about git-repack -F
when you decide to benchmark that!
鉴于充裕的网络带宽,这将在事实上导致了更快的克隆。不要忘记git-repack -F
何时决定对其进行基准测试!
回答by northtree
Use the depth to create a shallow clone.
使用深度创建一个浅克隆。
git clone --depth 1 <repository>
回答by VonC
The git clone --depth=1 ...
suggested in 2014will become faster in Q2 2019 with Git 2.22.
That is because, during an initial "git clone --depth=...
" partial clone, it is
pointless to spend cycles for a large portion of the connectivity
check that enumerates and skips promisor objects (which by definition is all objects fetched from the other side).
This has been optimized out.
git clone --depth=1 ...
2014 年的建议将在 2019 年第二季度使用 Git 2.22 变得更快。
这是因为,在最初的“ git clone --depth=...
”部分克隆期间,为枚举和跳过承诺对象(根据定义是从另一侧获取的所有对象)的大部分连接检查花费周期是没有意义的。
这已经被优化了。
clone
: do faster object check for partial clonesFor partial clones, doing a full connectivity check is wasteful; we skip promisor objects (which, for a partial clone, is all known objects), and enumerating them all to exclude them from the connectivity check can take a significant amount of time on large repos.
At most, we want to make sure that we get the objects referred to by any wanted refs.
For partial clones, just check that these objects were transferred.
clone
:对部分克隆进行更快的对象检查对于部分克隆,做一个完整的连接检查是浪费的;我们跳过承诺对象(对于部分克隆来说,它是所有已知对象),并且枚举它们以将它们从连接检查中排除在大型存储库上可能需要大量时间。
最多,我们想确保我们得到任何想要的引用所引用的对象。
对于部分克隆,只需检查这些对象是否已传输。
Result:
结果:
Test dfa33a2^ dfa33a2
-------------------------------------------------------------------------
5600.2: clone without blobs 18.41(22.72+1.09) 6.83(11.65+0.50) -62.9%
5600.3: checkout of result 1.82(3.24+0.26) 1.84(3.24+0.26) +1.1%
62% faster!
速度提高 62%!
With Git 2.26 (Q1 2020), an unneeded connectivity check is now disabled in a partial clone when fetching into it.
在 Git 2.26(2020 年第 1 季度)中,现在在获取部分克隆时禁用了不需要的连接检查。
See commit 2df1aa2, commit 5003377(12 Jan 2020) by Jonathan Tan (jhowtan
).
(Merged by Junio C Hamano -- gitster
--in commit 8fb3945, 14 Feb 2020)
请参阅Jonathan Tan ( ) 的commit 2df1aa2和commit 5003377(2020 年 1 月 12 日)。(由Junio C Hamano合并-- --在提交 8fb3945 中,2020 年 2 月 14 日)jhowtan
gitster
connected
: verify promisor-ness of partial cloneSigned-off-by: Jonathan Tan
Reviewed-by: Jonathan NiederCommit dfa33a298d("
clone
: do faster object check for partial clones", 2019-04-21, Git v2.22.0-rc0 -- merge) optimized the connectivity check done when cloning with--filter
to check only the existence of objects directly pointed to by refs.
But this is not sufficient: they also need to be promisor objects.
Make this check more robust by instead checking that these objects are promisor objects, that is, they appear in a promisor pack.
connected
: 验证部分克隆的承诺签字人:Jonathan Tan
审核人:Jonathan Nieder提交dfa33a298d(“
clone
:对部分克隆进行更快的对象检查”,2019-04-21,Git v2.22.0-rc0 -- merge)优化了在克隆时完成的连接检查,--filter
以仅检查 refs 直接指向的对象的存在.
但这还不够:它们还需要是承诺对象。
通过检查这些对象是否是承诺对象,即它们出现在承诺包中,使此检查更加健壮。
And:
和:
fetch
: forgo full connectivity check if--filter
Signed-off-by: Jonathan Tan
Reviewed-by: Jonathan NiederIf a filter is specified, we do not need a full connectivity check on the contents of the packfile we just fetched; we only need to check that the objects referenced are promisor objects.
This significantly speeds up fetches into repositories that have many promisor objects, because during the connectivity check, all promisor objects are enumerated (to mark them UNINTERESTING), and that takes a significant amount of time.
fetch
:放弃完全连接检查是否--filter
签字人:Jonathan Tan
审核人:Jonathan Nieder如果指定了过滤器,我们不需要对刚刚获取的包文件的内容进行完整的连接检查;我们只需要检查引用的对象是否是承诺对象。
这显着加快了对具有许多承诺对象的存储库的提取速度,因为在连接检查期间,所有承诺对象都被枚举(以将它们标记为 UNINTERESTING),这需要大量时间。
And, still with Git 2.26 (Q1 2020), The object reachability bitmap machinery and the partial cloning machinery were not prepared to work well together, because some object-filtering criteria that partial clones use inherently rely on object traversal, but the bitmap machinery is an optimization to bypass that object traversal.
而且,在 Git 2.26(2020 年第一季度)中,对象可达性位图机制和部分克隆机制还没有准备好协同工作,因为部分克隆使用的一些对象过滤标准本质上依赖于对象遍历,但位图机制是绕过该对象遍历的优化。
There however are some cases where they can work together, and they were taught about them.
然而,在某些情况下,他们可以一起工作,并且他们被教导了这些。
See commit 20a5fd8(18 Feb 2020) by Junio C Hamano (gitster
).
See commit 3ab3185, commit 84243da, commit 4f3bd56, commit cc4aa28, commit 2aaeb9a, commit 6663ae0, commit 4eb707e, commit ea047a8, commit 608d9c9, commit 55cb10f, commit 792f811, commit d90fe06(14 Feb 2020), and commit e03f928, commit acac50d, commit 551cf8b(13 Feb 2020) by Jeff King (peff
).
(Merged by Junio C Hamano -- gitster
--in commit 0df82d9, 02 Mar 2020)
请参阅Junio C Hamano() 的commit 20a5fd8(2020 年 2 月 18 日)。
见提交3ab3185,提交84243da,提交4f3bd56,提交cc4aa28,提交2aaeb9a,提交6663ae0,提交4eb707e,提交ea047a8,提交608d9c9,提交55cb10f,提交792f811,提交d90fe06(2020年2月14日),以及提交e03f928,提交acac50d,提交551cf8b(2020 年 2 月 13 日)作者:Jeff King ( )gitster
peff
.
(由Junio C gitster
Hamano合并-- --在0df82d9 提交中,2020 年 3 月 2 日)
pack-bitmap
: implementBLOB_LIMIT
filteringSigned-off-by: Jeff King
Just as the previous commit implemented
BLOB_NONE
, we can supportBLOB_LIMIT
filters by looking at the sizes of any blobs in the result and unsetting their bits as appropriate.
This is slightly more expensive thanBLOB_NONE,
but still produces a noticeable speedup (these results are on git.git):Test HEAD~2 HEAD ------------------------------------------------------------------------------------ 5310.9: rev-list count with blob:none 1.80(1.77+0.02) 0.22(0.20+0.02) -87.8% 5310.10: rev-list count with blob:limit=1k 1.99(1.96+0.03) 0.29(0.25+0.03) -85.4%
The implementation is similar to the
BLOB_NONE
one, with the exception that we have to go object-by-object while walking the blob-type bitmap (since we can't mask out the matches, but must look up the size individually for each blob).
The trick with usingctz64()
is taken fromshow_objects_for_type()
, which likewise needs to find individual bits (but wants to quickly skip over big chunks without blobs).
pack-bitmap
: 实现BLOB_LIMIT
过滤签字人:Jeff King
就像之前实现
BLOB_NONE
的提交一样,我们可以BLOB_LIMIT
通过查看结果中任何 blob 的大小并适当地取消设置它们的位来支持过滤器。
这比略贵,BLOB_NONE,
但仍然产生明显的加速(这些结果在git.git 上):Test HEAD~2 HEAD ------------------------------------------------------------------------------------ 5310.9: rev-list count with blob:none 1.80(1.77+0.02) 0.22(0.20+0.02) -87.8% 5310.10: rev-list count with blob:limit=1k 1.99(1.96+0.03) 0.29(0.25+0.03) -85.4%
实现与实现类似,不同之处
BLOB_NONE
在于我们必须在遍历 blob 类型位图时逐个对象(因为我们无法屏蔽匹配项,但必须单独查找每个 blob 的大小) .
using 的技巧ctz64()
来自show_objects_for_type()
,它同样需要找到单个位(但想要快速跳过没有 blob 的大块)。
Git 2.27 (Q2 2020) will simplify the commit ancestry connectedness check in a partial clone repository in which "promised" objects are assumed to be obtainable lazily on-demand from promisor remote repositories.
Git 2.27(2020 年第 2 季度)将简化部分克隆存储库中的提交祖先连接性检查,其中假定“承诺的”对象可以从承诺远程存储库按需延迟获取。
See commit 2b98478(20 Mar 2020) by Jonathan Tan (jhowtan
).
(Merged by Junio C Hamano -- gitster
--in commit 0c60105, 22 Apr 2020)
请参阅Jonathan Tan ( ) 的commit 2b98478(20 Mar 2020 )。(由Junio C Hamano合并-- --在commit 0c60105,2020 年 4 月 22 日)jhowtan
gitster
connected
: always use partial clone optimizationSigned-off-by: Jonathan Tan
Reviewed-by: Josh SteadmonWith 50033772d5("
connected
: verify promisor-ness of partial clone", 2020-01-30, Git v2.26.0-rc0 -- mergelisted in batch #5), the fast path (checking promisor packs) incheck_connected()
now passes a subset of the slow path (rev-list) - if all objects to be checked are found in promisor packs, both the fast path and the slow path will pass; otherwise, the fast path will definitely not pass.This means that we can always attempt the fast path whenever we need to do the slow path.
The fast path is currently guarded by a flag; therefore, remove that flag.
Also, make the fast path fallback to the slow path - if the fast path fails, the failing OID and all remaining OIDs will be passed to rev-list.The main user-visible benefit is the performance of fetch from a partial clone - specifically, the speedup of the connectivity check done before the fetch.
In particular, a no-op fetch into a partial clone on my computer was sped up from 7 seconds to 0.01 seconds. This is a complement to the work in 2df1aa239c("fetch
: forgo full connectivity check if --filter", 2020-01-30, Git v2.26.0-rc0 -- mergelisted in batch #5), which is the child of the aforementioned 50033772d5. In that commit, the connectivity check afterthe fetch was sped up.The addition of the fast path might cause performance reductions in these cases:
If a partial clone or a fetch into a partial clone fails, Git will fruitlessly run
rev-list
(it is expected that everything fetched would go into promisor packs, so if that didn't happen, it is most likely that rev-list will fail too).Any connectivity checks done by receive-pack, in the (in my opinion, unlikely) event that a partial clone serves receive-pack.
I think that these cases are rare enough, and the performance reduction in this case minor enough (additional object DB access), that the benefit of avoiding a flag outweighs these.
connected
: 始终使用部分克隆优化签字人:Jonathan Tan
评论人:Josh Steadmon使用50033772d5(“
connected
:验证部分克隆的承诺”,2020 年 1 月 30 日,Git v2.26.0-rc0 --合并在第 5 批中列出),快速路径(检查承诺包)check_connected()
现在通过慢路径(rev-list)——如果在promisor包中找到所有要检查的对象,则快路径和慢路径都将通过;否则,快速路径肯定不会通过。这意味着每当我们需要做慢速路径时,我们总是可以尝试快速路径。
快速路径目前由标志保护;因此,删除该标志。
此外,让快速路径回退到慢速路径 - 如果快速路径失败,失败的 OID 和所有剩余的 OID 将传递给 rev-list。主要的用户可见的好处是从部分克隆中获取的性能 - 特别是在获取之前完成的连接检查的加速。
特别是,对我计算机上部分克隆的无操作提取从 7 秒加快到 0.01 秒。这是对2df1aa239c 中工作的补充(“fetch
:forgofull connections check if --filter”, 2020-01-30, Git v2.26.0-rc0 -- merge列在第 5 批中),它是前面提到的50033772d5。在该提交中,提取后的连接检查加速。在这些情况下,添加快速路径可能会导致性能下降:
如果部分克隆或获取部分克隆失败,Git 将毫无结果地运行
rev-list
(预计获取的所有内容都将进入承诺包,因此如果没有发生,则 rev-list 很可能也会失败) .在(在我看来,不太可能)部分克隆服务于接收包的情况下,接收包完成的任何连接检查。
我认为这些情况很少见,而且这种情况下的性能下降足够小(额外的对象数据库访问),避免标志的好处超过了这些。
With Git 2.27 (Q2 2020), the object walk with object filter "--filter=tree:0
" can now take advantage of the pack bitmap when available.
在 Git 2.27(2020 年第二季度)中,带有对象过滤器“ --filter=tree:0
”的对象遍历现在可以在可用时利用包位图。
See commit 9639474, commit 5bf7f1e(04 May 2020) by Jeff King (peff
).
See commit b0a8d48, commit 856e12c(04 May 2020) by Taylor Blau (ttaylorr
).
(Merged by Junio C Hamano -- gitster
--in commit 69ae8ff, 13 May 2020)
请参阅Jeff King ( ) 的commit 9639474和commit 5bf7f1e(2020 年 5 月 4 日)。
请参阅Taylor Blau ( ) 的提交 b0a8d48和856e12c(2020 年 5 月 4 日)。(由Junio C Hamano合并-- --在提交 69ae8ff 中,2020 年 5 月 13 日)peff
ttaylorr
gitster
pack-bitmap.c
: support 'tree:0' filteringSigned-off-by: Taylor Blau
In the previous patch, we made it easy to define other filters that exclude all objects of a certain type. Use that in order to implement bitmap-level filtering for the '
--filter=tree:<n>
' filter when 'n
' is equal to0
.The general case is not helped by bitmaps, since for values of '
n > 0
', the object filtering machinery requires a full-blown tree traversal in order to determine the depth of a given tree.
Caching this is non-obvious, too, since the same tree object can have a different depth depending on the context (e.g., a tree was moved up in the directory hierarchy between two commits).But, the '
n = 0
' case can be helped, and this patch does so.
Runningp5310.11
in this tree and on master with the kernel, we can see that this case is helped substantially:Test master this tree -------------------------------------------------------------------------------- 5310.11: rev-list count with tree:0 10.68(10.39+0.27) 0.06(0.04+0.01) -99.4%
pack-bitmap.c
: 支持 'tree:0' 过滤签字人:Taylor Blau
在上一个补丁中,我们可以轻松定义排除某种类型的所有对象的其他过滤器。
--filter=tree:<n>
当“n
”等于时,使用它来为“ ”过滤器实现位图级过滤0
。位图对一般情况没有帮助,因为对于 '
n > 0
' 的值,对象过滤机制需要完整的树遍历以确定给定树的深度。
缓存这也是不明显的,因为相同的树对象可以根据上下文具有不同的深度(例如,树在两次提交之间的目录层次结构中向上移动)。但是,“
n = 0
” 情况可以提供帮助,而这个补丁就是这样做的。在这棵树上
运行p5310.11
,并在带有内核的 master 上运行,我们可以看到这种情况有很大帮助:Test master this tree -------------------------------------------------------------------------------- 5310.11: rev-list count with tree:0 10.68(10.39+0.27) 0.06(0.04+0.01) -99.4%
And:
和:
See commit 9639474, commit 5bf7f1e(04 May 2020) by Jeff King (peff
).
See commit b0a8d48, commit 856e12c(04 May 2020) by Taylor Blau (ttaylorr
).
(Merged by Junio C Hamano -- gitster
--in commit 69ae8ff, 13 May 2020)
请参阅Jeff King ( ) 的commit 9639474和commit 5bf7f1e(2020 年 5 月 4 日)。
请参阅Taylor Blau ( ) 的提交 b0a8d48和856e12c(2020 年 5 月 4 日)。(由Junio C Hamano合并-- --在提交 69ae8ff 中,2020 年 5 月 13 日)peff
ttaylorr
gitster
pack-bitmap
: pass object filter to fill-in traversalSigned-off-by: Jeff King
Signed-off-by: Taylor BlauSometimes a bitmap traversal still has to walk some commits manually, because those commits aren't included in the bitmap packfile (e.g., due to a push or commit since the last full repack).
If we're given an object filter, we don't pass it down to this traversal.
It's not necessary for correctness because the bitmap code has its own filters to post-process the bitmap result (which it must, to filter out the objects that arementioned in the bitmapped packfile).And with blob filters, there was no performance reason to pass along those filters, either. The fill-in traversal could omit them from the result, but it wouldn't save us any time to do so, since we'd still have to walk each tree entry to see if it's a blob or not.
But now that we support tree filters, there's opportunity for savings. A
tree:depth=0
filter means we can avoid accessing trees entirely, since we know we won't them (or any of the subtrees or blobs they point to).
The new test inp5310
shows this off (the "partial bitmap" state is one whereHEAD~100
and its ancestors are all in a bitmapped pack, butHEAD~100..HEAD
are not).Here are the results (run against
linux.git
):Test HEAD^ HEAD ------------------------------------------------------------------------------------------------- [...] 5310.16: rev-list with tree filter (partial bitmap) 0.19(0.17+0.02) 0.03(0.02+0.01) -84.2%
The absolute number of savings isn't huge, but keep in mind that we only omitted 100 first-parent links (in the version of
linux.git
here, that's 894 actual commits).In a more pathological case, we might have a much larger proportion of non-bitmapped commits. I didn't bother creating such a case in the perf script because the setup is expensive, and this is plenty to show the savings as a percentage.
pack-bitmap
: 通过对象过滤器进行填充遍历签字人:Jeff King
签字人:Taylor Blau有时位图遍历仍然需要手动遍历一些提交,因为这些提交不包含在位图打包文件中(例如,由于自上次完全重新打包以来的推送或提交)。
如果我们得到一个对象过滤器,我们不会将它传递给这个遍历。
正确性不是必需的,因为位图代码有自己的过滤器来对位图结果进行后处理(它必须过滤掉位图包文件中提到的对象)。对于 blob 过滤器,也没有传递这些过滤器的性能原因。填充遍历可以从结果中省略它们,但它不会为我们节省任何时间,因为我们仍然必须遍历每个树条目以查看它是否是 blob。
但既然我们支持树过滤器,就有机会节省成本。一个
tree:depth=0
过滤装置,我们可以完全避免访问树,因为我们知道,我们不会把他们(或子树或斑点它们指向的)。
中的新测试p5310
显示了这一点(“部分位图”状态是一种HEAD~100
及其祖先都在位图包中的状态,但HEAD~100..HEAD
不是)。以下是结果(针对
linux.git
):Test HEAD^ HEAD ------------------------------------------------------------------------------------------------- [...] 5310.16: rev-list with tree filter (partial bitmap) 0.19(0.17+0.02) 0.03(0.02+0.01) -84.2%
储蓄的绝对数量并不庞大,但请记住,我们只省略100第一父链接(在版本的
linux.git
在这里,这是894个实际提交)。在更病态的情况下,我们可能有更大比例的非位图提交。我没有费心在 perf 脚本中创建这样的案例,因为设置很昂贵,这足以显示节省的百分比。
回答by dwa.kang
I'm bench marking git clone.
我正在基准测试 git clone。
It can be faster with --jobs options if the project include submodules ex:
如果项目包含子模块,则使用 --jobs 选项会更快:
git clone --recursive --shallow-submodules --depth 1 --branch "your tag or branch" --jobs 5 -- "your remote repo"
回答by idanzalz
From the log it seems you already finished the clone, if your problem is that you need to do this process multiple times on different machines, you can just copy the repository directory from one machine to the other. This way will preserve the relationship (remotes) between each copy and the repository you cloned from.
从日志看来你已经完成了克隆,如果你的问题是你需要在不同的机器上多次执行这个过程,你可以将存储库目录从一台机器复制到另一台机器。这种方式将保留每个副本和您从中克隆的存储库之间的关系(远程)。