Git 中的文件限制是多少(数量和大小)?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/984707/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-10 06:35:17  来源:igfitidea点击:

What are the file limits in Git (number and size)?

git

提问by Alexandre Rademaker

Does anyone know what are the Git limits for number of files and size of files?

有谁知道文件数量和文件大小的 Git 限制是什么?

采纳答案by VonC

This message from Linus himselfcan help you with some other limits

来自Linus 本人的这条消息可以帮助您解决其他一些限制

[...] CVS, ie it really ends up being pretty much oriented to a "one file at a time" model.

Which is nice in that you can have a million files, and then only check out a few of them - you'll never even seethe impact of the other 999,995 files.

Git fundamentally never really looks at less than the whole repo. Even if you limit things a bit (ie check out just a portion, or have the history go back just a bit), git ends up still always caring about the whole thing, and carrying the knowledge around.

So git scales really badly if you force it to look at everything as one hugerepository. I don't think that part is really fixable, although we can probably improve on it.

And yes, then there's the "big file" issues. I really don't know what to do about huge files. We suck at them, I know.

[...] CVS,即它实际上最终几乎面向“一次一个文件”模型。

这很好,因为您可以拥有一百万个文件,然后只检查其中的几个 - 您甚至永远不会看到其他 999,995 个文件的影响。

Git 从根本上从来没有真正关注过整个 repo。即使你稍微限制了一些事情(即只检查一部分,或者让历史回顾一点),git 最终仍然总是关心整个事情,并随身携带知识。

因此,如果您强迫 git 将所有内容视为一个巨大的存储库,那么 git 的扩展性会非常糟糕 。我不认为那部分真的可以修复,尽管我们可能会改进它。

是的,还有“大文件”问题。我真的不知道如何处理大文件。我们很讨厌他们,我知道。

See more in my other answer: the limit with Git is that each repository must represent a "coherent set of files", the "all system" in itself (you can not tag "part of a repository").
If your system is made of autonomous (but inter-dependent) parts, you must use submodules.

在我的另一个答案中查看更多信息:Git 的限制是每个存储库必须代表一个“一致的文件集”,即“所有系统”本身(您不能标记“存储库的一部分”)。
如果您的系统由自主(但相互依赖)的部分组成,则必须使用submodules

As illustrated by Talljoe's answer, the limit can be a systemone (large number of files), but if you do understand the nature of Git (about data coherency represented by its SHA-1 keys), you will realize the true "limit" is a usageone: i.e, you should not try to store everythingin a Git repository, unless you are prepared to always get or tag everything back. For some large projects, it would make no sense.

正如Talljoe 的回答所示,限制可以是系统一(大量文件),但是如果您确实了解 Git 的性质(关于由其 SHA-1 密钥表示的数据一致性),您将意识到真正的“限制”是一种用法:即,您不应该尝试将所有内容存储在 Git 存储库中,除非您准备好始终获取或标记所有内容。对于一些大型项目来说,这是没有意义的。



For a more in-depth look at git limits, see "git with large files"
(which mentions git-lfs: a solution to store large files outside the git repo. GitHub, April 2015)

要更深入地了解 git 限制,请参阅“带有大文件的 git
(其中提到了git-lfs:在 git 存储库之外存储大文件的解决方案。GitHub,2015 年 4 月)

The three issues that limits a git repo:

限制 git repo 的三个问题:

  • huge files(the xdelta for packfileis in memory only, which isn't good with large files)
  • huge number of files, which means, one file per blob, and slow git gc to generate one packfile at a time.
  • huge packfiles, with a packfile index inefficient to retrieve data from the (huge) packfile.
  • 大文件packfile 的 xdelta仅在内存中,不适用于大文件)
  • 大量文件,这意味着每个 blob 一个文件,并且 git gc 一次生成一个包文件的速度很慢。
  • 巨大的包文件,带有包文件索引,无法从(巨大的)包文件中检索数据。


A more recent thread (Feb. 2015) illustrates the limiting factors for a Git repo:

最近的一个线程(2015 年 2 月)说明了 Git 存储库的限制因素

Will a few simultaneous clones from the central server also slow down other concurrent operations for other users?

There are no locks in server when cloning, so in theory cloning does not affect other operations. Cloning can use lots of memory though (and a lot of cpu unless you turn on reachability bitmap feature, which you should).

Will 'git pull' be slow?

If we exclude the server side, the size of your tree is the main factor, but your 25k files should be fine (linux has 48k files).

'git push'?

This one is not affected by how deep your repo's history is, or how wide your tree is, so should be quick..

Ah the number of refs may affect both git-pushand git-pull.
I think Stefan knows better than I in this area.

'git commit'? (It is listed as slow in reference 3.) 'git status'? (Slow again in reference 3 though I don't see it.)
(also git-add)

Again, the size of your tree. At your repo's size, I don't think you need to worry about it.

Some operations might not seem to be day-to-day but if they are called frequently by the web front-end to GitLab/Stash/GitHub etc then they can become bottlenecks. (e.g. 'git branch --contains' seems terribly adversely affected by large numbers of branches.)

git-blamecould be slow when a file is modified a lot.

来自中央服务器的几个同步克隆是否也会减慢其他用户的其他并发操作?

克隆时服务器没有锁,所以理论上克隆不会影响其他操作。不过,克隆可以使用大量内存(以及大量 CPU,除非您打开可达性位图功能,您应该这样做)。

' git pull' 会慢吗?

如果我们排除服务器端,则树的大小是主要因素,但是您的 25k 文件应该没问题(linux 有 48k 文件)。

' git push'?

这个不受你的回购历史有多深,或者你的树有多宽的影响,所以应该快..

啊,引用的数量可能会影响git-pushgit-pull
我认为 Stefan 在这方面比我更了解。

' git commit'?(它在参考文献 3 中被列为慢。) ' git status'? (虽然我没有看到它,但在参考文献 3 中再次慢了下来。)
(也git-add

再次,你的树的大小。以您的回购规模,我认为您无需担心。

有些操作可能看起来不是日常操作,但如果 Web 前端频繁调用它们到 GitLab/Stash/GitHub 等,那么它们可能会成为瓶颈。(例如,“ git branch --contains”似乎受到大量分支的严重不利影响。)

git-blame当文件被大量修改时可能会很慢。

回答by Talljoe

There is no real limit -- everything is named with a 160-bit name. The size of the file must be representable in a 64 bit number so no real limit there either.

没有真正的限制——一切都以 160 位名称命名。文件的大小必须以 64 位数字表示,因此也没有实际限制。

There is a practical limit, though. I have a repository that's ~8GB with >880,000 files and git gc takes a while. The working tree is rather large so operations that inspect the entire working directory take quite a while. This repo is only used for data storage, though, so it's just a bunch of automated tools that handle it. Pulling changes from the repo is much, much faster than rsyncing the same data.

但是,有一个实际限制。我有一个大约 8GB 的​​存储库,其中包含 >880,000 个文件,而 git gc 需要一段时间。工作树相当大,因此检查整个工作目录的操作需要很长时间。不过,这个 repo 仅用于数据存储,所以它只是一堆处理它的自动化工具。从 repo 中提取更改比 rsync 相同的数据快得多。

%find . -type f | wc -l
791887
%time git add .
git add .  6.48s user 13.53s system 55% cpu 36.121 total
%time git status
# On branch master
nothing to commit (working directory clean)
git status  0.00s user 0.01s system 0% cpu 47.169 total
%du -sh .
29G     .
%cd .git
%du -sh .
7.9G    .

回答by Brian Carlton

If you add files that are too large (GBs in my case, Cygwin, XP, 3 GB RAM), expect this.

如果您添加过大的文件(在我的情况下为 GB,Cygwin、XP、3 GB RAM),请期待这一点。

fatal: Out of memory, malloc failed

致命:内存不足,malloc 失败

More details here

更多细节在这里

Update 3/2/11: Saw similar in Windows 7 x64 with Tortoise Git. Tons of memory used, very very slow system response.

2011 年 3 月 2 日更新:在带有 Tortoise Git 的 Windows 7 x64 中看到了类似的情况。使用了大量内存,系统响应速度非常慢。

回答by CharlesB

Back in Feb 2012, there was a very interesting thread on the Git mailing listfrom Joshua Redstone, a Facebook software engineer testing Git on a huge test repository:

早在 2012 年 2 月,Facebook 软件工程师 Joshua Redstone在 Git 邮件列表上有一个非常有趣的主题,他在一个巨大的测试存储库上测试 Git:

The test repo has 4 million commits, linear history and about 1.3 million files.

测试存储库有 400 万次提交、线性历史记录和大约 130 万个文件。

Tests that were run show that for such a repo Git is unusable (cold operation lasting minutes), but this may change in the future. Basically the performance is penalized by the number of stat()calls to the kernel FS module, so it will depend on the number of files in the repo, and the FS caching efficiency. See also this Gistfor further discussion.

运行的测试表明,对于这样的 repo,Git 无法使用(冷操作持续几分钟),但这在未来可能会改变。基本上,性能受到stat()内核 FS 模块调用次数的影响,因此它将取决于 repo 中的文件数量和 FS 缓存效率。另请参阅此要点以进行进一步讨论。

回答by Dustin

It depends on what your meaning is. There are practical size limits (if you have a lot of big files, it can get boringly slow). If you have a lot of files, scans can also get slow.

这取决于你的意思是什么。有实际的大小限制(如果你有很多大文件,它会变得很慢)。如果您有很多文件,扫描也会变慢。

There aren't really inherent limits to the model, though. You can certainly use it poorly and be miserable.

不过,该模型并没有真正固有的限制。你当然可以使用它很糟糕并且很痛苦。

回答by Kim Sullivan

As of 2018-04-20 Git for Windows has a bugwhich effectively limits the file size to 4GB max using that particular implementation (this bug propagates to lfs as well).

截至 2018 年 4 月 20 日,Windows 版 Git 存在一个错误,使用该特定实现有效地将文件大小限制为最大 4GB(此错误也会传播到 lfs)。

回答by Kzqai

I think that it's good to try to avoid large file commits as being part of the repository (e.g. a database dump might be better off elsewhere), but if one considers the size of the kernel in its repository, you can probably expect to work comfortably with anything smaller in size and less complex than that.

我认为尽量避免将大文件提交作为存储库的一部分是很好的(例如,数据库转储在其他地方可能会更好),但是如果考虑其存储库中内核的大小,您可能会期望舒适地工作任何尺寸更小、更简单的东西。

回答by Kasisnu

I found this trying to store a massive number of files(350k+) in a repo. Yes, store. Laughs.

我发现这试图在存储库中存储大量文件(350k+)。是的,存储。笑。

$ time git add . 
git add . 333.67s user 244.26s system 14% cpu 1:06:48.63 total

The following extracts from the Bitbucket documentationare quite interesting.

Bitbucket文档中的以下摘录非常有趣。

When you work with a DVCS repository cloning, pushing, you are working with the entire repository and all of its history. In practice, once your repository gets larger than 500MB, you might start seeing issues.

... 94% of Bitbucket customers have repositories that are under 500MB. Both the Linux Kernel and Android are under 900MB.

当您使用 DVCS 存储库克隆、推送时,您正在使用整个存储库及其所有历史记录。实际上,一旦您的存储库大于 500MB,您可能会开始看到问题。

... 94% 的 Bitbucket 客户拥有小于 500MB 的存储库。Linux 内核和 Android 都在 900MB 以下。

The recommended solution on that page is to split your project into smaller chunks.

该页面上推荐的解决方案是将您的项目拆分为更小的块。

回答by funwhilelost

I have a generous amount of data that's stored in my repo as individual JSON fragments. There's about 75,000 files sitting under a few directories and it's not really detrimental to performance.

我有大量数据作为单独的 JSON 片段存储在我的存储库中。几个目录下大约有 75,000 个文件,这对性能并没有真正的影响。

Checking them in the first time was, obviously, a little slow.

显然,第一次检查它们有点慢。

回答by Michael Hu

git has a 4G (32bit) limit for repo.

git 对 repo 有 4G(32 位)限制。

http://code.google.com/p/support/wiki/GitFAQ

http://code.google.com/p/support/wiki/GitFAQ