是否建议将 Git 用于大型(> 250GB)内容存储库

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/999744/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-19 03:42:52  来源:igfitidea点击:

Is Git recommended for large (>250GB) content repositories

gitversion-controlperforce

提问by kaychaks

The web-application is a custom-built CMS which has several sub-applications and each one of them has code and content residing in the same directory structure. Due to the application framework's architecture the code and content are intertwined (content depends upon the code for its display and other functionalities) and hence are inseparable. The contents are not stored as BLOB rather they are stored as files and the underlying DB is used to link them. Size of sub-applications ranges from 20GB - 250GB and more (this is the killer).

Web 应用程序是一个定制的 CMS,它有多个子应用程序,每个子应用程序的代码和内容都位于相同的目录结构中。由于应用程序框架的架构,代码和内容是交织在一起的(内容取决于其显示和其他功能的代码),因此是不可分割的。内容不存储为 BLOB,而是存储为文件,底层 DB 用于链接它们。子应用程序的大小范围从 20GB - 250GB 甚至更多(这是杀手锏)。

The web-application will go for some enhancements in code (new sub-applications, bug-fixes etc.) and at the same time users will add/update the contents through the already live system. Hence, a deployment/release process is required and most importantly a version control system needs to be suggested for both code and content.

Web 应用程序将在代码中进行一些增强(新的子应用程序、错误修复等),同时用户将通过已经上线的系统添加/更新内容。因此,需要一个部署/发布过程,最重要的是需要为代码和内容建议一个版本控制系统。

Git comes to the picture because of reasons - it is open-source & free, ease of branching & merging, its not centralized & hence no single-point-of-failure.

Git 出现的原因是 - 它是开源和免费的,易于分支和合并,它不是集中式的,因此没有单点故障。

BUT after some initial research in the web, I found out some disappointing facts which are applicable to our application - using Git for large systems like ours is painful (checkout, clone, merge, push, pull) and commands are complicated ("geeky" would be more appropriate) for a developer base which is DVCS ignorant and mostly Windows users.

但是在网络上进行了一些初步研究后,我发现了一些适用于我们的应用程序的令人失望的事实 - 将 Git 用于像我们这样的大型系统是痛苦的(结帐、克隆、合并、推送、拉取)并且命令很复杂(“极客”将更合适)适用于不了解 DVCS 且主要是 Windows 用户的开发人员基础。

There is no fixed mindset for Git but if I have to go for a centralized approach (in really WORST case) then what should be the way (CVS & SVN apart). I have read about Perforce being a stable one and is also used in Google (I expect some brashes here!!).

Git 没有固定的心态,但如果我必须采用集中式方法(在最糟糕的情况下),那么应该采用什么方法(CVS 和 SVN 分开)。我已经读过 Perforce 是一个稳定的,并且也在谷歌中使用(我希望这里有一些傲慢!!)。

Please share, guide and comment your views. I really require them.

请分享、指导和评论您的观点。我真的需要他们。

采纳答案by pgs

I just happened to be reading this blog postnot one minute ago. It's a bit of a rant about the scalability of git.

我刚好在一分钟前阅读了这篇博文。这是关于 git 的可扩展性的一个咆哮。

Edit: Eight years later, and Git has Large File Storage(LFS), and Microsoft is open sourcing Git Virtual File System(GVFS) so they can use git to develop Windows.

编辑:八年后,Git 有了大文件存储(LFS),微软正在开源Git 虚拟文件系统(GVFS),以便他们可以使用 git 开发 Windows。

回答by Matthew Flaschen

First, I don't agree that Git is inappropriate for non-technical users. Yes, there are certain features that newbies won't use (e.g. git-send-email). But there are also GUIs like TortoiseGitto make simple things simple.

首先,我不同意 Git 不适合非技术用户。是的,有些功能是新手不会使用的(例如 git-send-email)。但是也有像TortoiseGit这样的GUI可以让简单的事情变得简单。

However, I think you're approaching things the wrong way. Basically, you have content that will change frequently and needs to be editable very easily by Joe Bloggs, and code that will be modified less frequently by coders. The traditional solution is to use a real CMS (e.g. Alfresco, SugarCRM, Drupal, etc. or a Wiki (MediaWiki, MoinMon, etc.), with optional plug-ins. Keep in mind, wikis (and most CMSes) allow versioning of content, in a "user-friendly" way.

但是,我认为您以错误的方式处理事情。基本上,您的内容会经常更改并且需要让 Joe Bloggs 非常轻松地进行编辑,而代码则不会经常被编码人员修改。传统的解决方案是使用真正的 CMS(例如AlfrescoSugarCRMDrupal等或 Wiki(MediaWikiMoinMon等)以及可选的插件。请记住,wiki(和大多数 CMS)允许版本控制内容,以“用户友好”的方式。

Even if you must keep your in-house code, I think you should still want to extricate the content so they can be treated separately. Once you have the code and content separate, your repository will be a more reasonable size. Then, you can use whatever VCS you want (though I'm not really sure you're right that Git is inherently bad for large repos).

即使您必须保留您的内部代码,我认为您仍然应该希望将内容提取出来,以便将它们分开处理。一旦您将代码和内容分开,您的存储库将是一个更合理的大小。然后,您可以使用任何您想要的 VCS(尽管我不确定您是否正确,Git 对大型存储库来说本质上是不利的)。

回答by Jared Oberhaus

git does not scale for large repositories. It's not the space, it's the number of files. Please read my blog articlethat I wrote a while back about this.

git 无法针对大型存储库进行扩展。这不是空间,而是文件的数量。请阅读我不久前写的关于此的博客文章

In my experience, if you want a scalable, fast, centralized source control system, P4is the way to go.

根据我的经验,如果您想要一个可扩展、快速、集中的源代码控制系统,P4是最佳选择。

回答by si618

Is SVN really such a bad option?

SVN 真的是一个糟糕的选择吗?

PROS:

优点:

  • Can handle large repositories e.g. many linux distro's use it, also Apache, Sourceforge
  • Has nice GUI front end with TortoiseSVN to keep your windows users happy
  • Can be used with windows integrated authentication to keep admins happy
  • Many different backup strategies can be adopted based on your requirements (svnadmin hotcopy or dump, svnsync, post-commit hooks) to help ease your single point of failure concern.
  • 可以处理大型存储库,例如许多 linux 发行版使用它,还有 Apache、Sourceforge
  • TortoiseSVN 拥有漂亮的 GUI 前端,让您的 Windows 用户满意
  • 可与 Windows 集成身份验证一起使用,让管理员满意
  • 可以根据您的要求(svnadmin hotcopy 或 dump、svnsync、post-commit hooks)采用许多不同的备份策略,以帮助缓解单点故障问题。

CONS:

缺点:

  • Centralised VCS
  • 集中式VCS

Disclaimer: I've never used Perforce and have been a happy SVN admin and user for ~6 years (since v0.29)

免责声明:我从未使用过 Perforce,并且在大约 6 年的时间里一直是一个快乐的 SVN 管理员和用户(自 v0.29 起)

回答by Mike Caron

There's a utility script called git-splitthat chops up a git repo to make it more efficient.

有一个名为git-split的实用程序脚本,它可以拆分git 存储库以提高效率。

回答by indusBull

Microsoft just released Git Virtual File System(GVFS) specifically to handle large code base with git. More details here at msdn

微软刚刚发布了Git 虚拟文件系统(GVFS),专门用于使用 git 处理大型代码库。更多详细信息请访问 msdn

Also Microsoft hosts the Windows source in a monstrous 300GB Git repository

此外微软托管在一个可怕的300GB Git仓库Windows源

I do not have any experience using GVFS.

我没有任何使用 GVFS 的经验。

回答by Macarse

I used git only once for a school project (php site with Zend Framework).

我只在学校项目(带有 Zend 框架的 php 站点)中使用过一次 git。

We used git but the teacher needed to have the final release on a svn repo.

我们使用了 git,但老师需要在 svn repo 上发布最终版本。

Comparing the checkout size:

比较结帐尺寸:

git checkout was half the size of MB of the svn checkout.

git checkout 是 svn checkout 的 MB 大小的一半。

My two cents.

我的两分钱。