加速初始 git-svn fetch

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3919962/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-19 04:38:49  来源:igfitidea点击:

Speeding up the initial git-svn fetch

svngitgit-svn

提问by MrEvil

I have a big repository, 100,000+ revisions with a very high branching factor. The initial fetch of the full SVN repository using git-svn has been running for around 2 months and it's only up to revision 60,000. Is there any way to speed this thing up?

我有一个很大的存储库,有 100,000 多个具有非常高分支因子的修订。使用 git-svn 对完整 SVN 存储库的初始提取已经运行了大约 2 个月,并且仅达到修订版 60,000。有什么办法可以加快这件事吗?

I'm already regularly killing and restarting the fetch due to git-svn leaking memory like a sieve. The transfer is occurring over the local LAN, so link speed shouldn't be an issue. The repository is on a dedicated machine backed by dedicated fiber channel arrays so the server should have plenty of oomph. The only other thing that I can think of is do the clone from a local copy of the SVN repository.

由于 git-svn 像筛子一样泄漏内存,我已经定期终止并重新启动提取。传输是通过本地 LAN 进行的,因此链接速度应该不是问题。存储库位于由专用光纤通道阵列支持的专用机器上,因此服务器应该有足够的吸引力。我能想到的唯一另一件事是从 SVN 存储库的本地副本进行克隆。

What have other people done in similar circumstances?

其他人在类似情况下做了什么?

采纳答案by MrEvil

Apparently there is no good answer. Some work is being done on git-fast-import but it isn't ready for prime time yet. They are still trying to figure out how to detect and represent 'svn cp' actions. The one bright spot is that someone on the list came up with an optimization for git-svn that seems to have made a big impact.

显然没有好的答案。一些关于 git-fast-import 的工作正在完成,但它还没有准备好迎接黄金时段。他们仍在试图弄清楚如何检测和表示“svn cp”动作。一个亮点是名单上的某个人提出了对 git-svn 的优化,似乎产生了很大的影响。

http://permalink.gmane.org/gmane.comp.version-control.git/168718

http://permalink.gmane.org/gmane.comp.version-control.git/168718

回答by Ben Hymanson

At work I use git-svn against a ~170000 revision SVN repo. What I did was use git-svn init+ git-svn fetch -r...to limit my initial fetch to a reasonable number of revisions. You must be careful to choose a revision that is actually in the branch you want. Everything is fully functional even with truncated history exceptgit-blame, which obviously attributes all the lines older than your starting rev to the first rev.

在工作中,我对 ~170000 修订版 SVN 存储库使用 git-svn。我所做的是使用git-svn init+git-svn fetch -r...将我的初始获取限制为合理数量的修订。您必须小心选择实际位于您想要的分支中的修订版。即使使用截断的历史记录,一切都功能齐全,除了git-blame,这显然将所有比您的起始 rev 旧的行归因于第一个 rev。

You can further speed this up with ignore-paths to prune out subtrees that you don't want.

您可以使用 ignore-paths 进一步加快速度以修剪您不想要的子树。

You can add more revisions later, but it will be painful. You will have to reset the rev-map (sadly I even wrote git-svn resetand I can't say offhand if it will remove allrevisions, so it may be by hand). Then git-svn fetchmore revisions and git-filter-branchto reparent your old root to the new tree. That will rewrite every commit but it won't affect the source blobs themselves. You have to do similar surgery when people undertake big reorgs of the svn repo.

您可以稍后添加更多修订,但这会很痛苦。您将不得不重置 rev-map(可悲的是我什至写过git-svn reset,如果它会删除所有修订,我不能立即说,所以它可能是手动的)。然后进行git-svn fetch更多修改并将git-filter-branch旧根重新设置为新树。这将重写每个提交,但不会影响源 blob 本身。当人们对 svn repo 进行大规模重组时,你必须做类似的手术。

If you actually need allof the revisions (for example for a migration) then you should be looking at some flavor of svn-fast-export + git-fast-import. There may be one that adds rev tags to match git-svn, in which case you could fast-import and then just graft in the svn remote. Even if the existing svn-fast-export options don't have that feature, you can probably add it before your original clone completes!

如果您确实需要所有修订版(例如迁移),那么您应该查看 svn-fast-export + git-fast-import 的某种风格。可能有一个添加 rev 标签以匹配 git-svn,在这种情况下,您可以快速导入,然后直接移植到 svn 遥控器中。即使现有的 svn-fast-export 选项没有该功能,您也可以在原始克隆完成之前添加它!

回答by Tobias Tobiasen

In a repository with 20k commits I had similar problems. In my case it turned out that there was a few strange tags in subversion that caused problems. There was tags that copied / instead of /trunk. That cause git svn fetch to go into infinite loop. I fixed it by converting in chunks.

在具有 20k 提交的存储库中,我遇到了类似的问题。就我而言,结果是 subversion 中有一些奇怪的标签导致了问题。有复制 / 而不是 /trunk 的标签。这导致 git svn fetch 进入无限循环。我通过分块转换来修复它。

git svn fetch -r0:1000
git svn fetch -r0:2000
git svn fetch -r0:3000

Watch the output and if you don't see new r... once in a while then something is wrong. Use git log --allto see how far the conversion got. Let say you got to 1565. Then continue the fetch like this.

观察输出,如果您没有看到新的 r... 偶尔出现问题。使用git log --all看多远转换了。假设你到达了 1565。然后像这样继续获取。

git svn fetch -r1567:2000

It was very tedious but it got the job done.

这很乏味,但它完成了工作。

回答by bengineerd

If you can find a server with enough RAM, do the whole clone operation on a ramdisk. On Linux systems you can use /dev/shm, which is backed by RAM.

如果你能找到一个有足够 RAM 的服务器,就在 ramdisk 上执行整个克隆操作。在 Linux 系统上,您可以使用由 RAM 支持的 /dev/shm。

> svnadmin hotcopy /path/to/svn/repo /dev/shm/svn-repo

> git svn clone file:///dev/shm/svn-repo /dev/shm/git-repo

Once that's done, you can point the git repo back to your real svn repo instead as described here: https://git.wiki.kernel.org/index.php/GitSvnSwitch

完成后,您可以将 git 存储库指向您真正的 svn 存储库,而不是如下所述:https: //git.wiki.kernel.org/index.php/GitSvnSwitch

  • Edit the svn-remote url URL in .git/config to point to the new domain name
  • Run git svn fetch - This needs to fetch at least one new revision from svn!
  • Change svn-remote url back to the original url
  • Run git svn rebase -l to do a local rebase (with the changes that came in with the last fetch operation)
  • Change svn-remote url back to the new url
  • Run git svn rebase should now work again!

This will only work, if the git svn fetch step actually fetches anything! (Took me a while to discover that... I had to put in a dummy revision to our svn repository to make it happen!)

  • 编辑 .git/config 中的 svn-remote url URL 指向新域名
  • 运行 git svn fetch - 这需要从 svn 获取至少一个新版本!
  • 将 svn-remote url 改回原来的 url
  • 运行 git svn rebase -l 进行本地变基(使用上次获取操作带来的更改)
  • 将 svn-remote url 改回新的 url
  • 运行 git svn rebase 现在应该可以再次运行了!

这只会在 git svn fetch 步骤实际获取任何东西的情况下起作用!(我花了一段时间才发现……我必须对我们的 svn 存储库进行虚拟修订才能实现!)

I just did this and was able to clone a 4.7G 12000 revision svn repo to git in about 3 hours.

我只是这样做了,并且能够在大约 3 小时内将 4.7G 12000 修订版 svn repo 克隆到 git。

回答by kevpie

I think you are on the right track

我认为你在正确的轨道上

Local file access could give you 1 to 2 order speedup.

本地文件访问可以为您提供 1 到 2 个订单的加速。

Not sure if running git svn against a bdb or files based svn backend would be faster.

不确定对 bdb 或基于文件的 svn 后端运行 git svn 是否会更快。

回答by Daniel Stutzbach

I've downloaded a close-to-100,000-revision SVN repository using git-svn before. It took around 48 hours and was notover a local LAN. Admittedly, you did say that your repository has a high branching factor, while the repository I downloaded did not (although it did have several dozen branches)

我之前使用 git-svn 下载了接近 100,000 次修订的 SVN 存储库。大约需要 48 小时,而且不是通过本地 LAN。诚然,您确实说过您的存储库具有高分支因子,而我下载的存储库没有(尽管它确实有几十个分支)

I'd suggest working on figuring out where the bottleneck lies. Are git-svn and its subprocesses using 100% CPU? Are the disk lights on the client or the SVN server constantly lit? How much bandwidth is being used? Once you know what the limiting factor is, you can work on figuring out how to fix it.

我建议努力找出瓶颈所在。git-svn 及其子进程是否使用 100% CPU?客户端或SVN服务器上的磁盘灯是否常亮?使用了多少带宽?一旦您知道限制因素是什么,您就可以着手找出解决方法。

回答by wollow

I have a repo with 8k+ reviews and around 240 tags. I tried to run and estimated that my intial git svn clone on windows would have taken months, simply doing

我有一个包含 8k+ 评论和大约 240 个标签的存储库。我尝试运行并估计我在 Windows 上的初始 git svn clone 需要几个月的时间,只需执行

git svn clone --stdlayout --no-metadata --authors-file=users.txt https://link.to.repo

The clone was was taking 5 seconds to import 1 revision on average. Please notice that whenever a tag is encountered, the clone restarts from rev 1, so potentially there are 8k * 240 operations = 111 days

克隆平均需要 5 秒才能导入 1 个修订版。请注意,无论何时遇到标签,克隆都会从 rev 1 重新启动,因此可能有 8k * 240 次操作 = 111 天

Summary of my all the steps I took to speed up the process:

我为加快进程而采取的所有步骤的总结:

  1. linux and osx implementation are much faster than cygwin on windows. I used a linux virtual machine. Please check https://stackoverflow.com/a/21599759/1448276

  2. I copied the entire svn repo to my machine with svnrdump

  1. linux 和 osx 实现比 windows 上的 cygwin 快得多。我用的是linux虚拟机。请检查https://stackoverflow.com/a/21599759/1448276

  2. 我用 svnrdump 将整个 svn repo 复制到我的机器上

svnrdump dump https://link.to.repo > repos.dump

svnrdump dump https://link.to.repo > repos.dump

  1. I created a local svn repo

    svnadmin create svnrepo

    svnadmin load svnrepo < repos.dump

  1. 我创建了一个本地 svn repo

    svnadmin create svnrepo

    svnadmin load svnrepo < repos.dump

as in https://stackoverflow.com/a/10407464/1448276

https://stackoverflow.com/a/10407464/1448276

  1. I created and mounted a ram based disk

    svnadmin hotcopy svnrepo/ /dev/shm/svnrepo

  1. 我创建并安装了一个基于 ram 的磁盘

    svnadmin hotcopy svnrepo/ /dev/shm/svnrepo

as above, https://stackoverflow.com/a/39030862/1448276

如上,https://stackoverflow.com/a/39030862/1448276

  1. And finally ran the clone

    git svn clone --stdlayout --no-metadata --prefix=origin/ --authors-file=users.txt file:///dev/shm/svnrepo

  1. 最后运行克隆

    git svn clone --stdlayout --no-metadata --prefix=origin/ --authors-file=users.txt file:///dev/shm/svnrepo

Here the clone is processing on average 12.5 revisions per second, so I expect it will take less than 2 days. I'll post an update once the clone is complete.

这里的克隆平均每秒处理 12.5 次修订,所以我预计它会花费不到 2 天的时间。克隆完成后,我将发布更新。

回答by timB33

2017 calling in. I'm migrating a 45k revision repo and I'm finding git-svn on Linux working about 10x faster than git-svn on my windows box. The Vm ison the same HyperV as my svn repo so it could be that.

2017 年到来。我正在迁移 45k 修订版存储库,我发现 Linux 上的 git-svn 比 Windows 机器上的 git-svn 快 10 倍。虚拟机在同一个Hyper-V的我的svn的所以它可能是。