Linux 上的最大文件/目录数?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/8238860/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Maximum number of files/directories on Linux?
提问by CodeVirtuoso
I'm developing a LAMP online store, which will allow admins to upload multiple images for each item.
我正在开发一个 LAMP 在线商店,这将允许管理员为每个项目上传多个图像。
My concern is - right off the bat there will be 20000 items meaning roughly 60000 images.
我担心的是 - 马上就会有 20000 个项目,这意味着大约 60000 张图像。
Questions:
问题:
What is the maximum number of files and/or directories on Linux?
What is the usual way of handling this situation (best practice)?
Linux 上的文件和/或目录的最大数量是多少?
处理这种情况的常用方法是什么(最佳实践)?
My idea was to make a directory for each item, based on its unique ID, but then I'd still have 20000 directories in a main uploadsdirectory, and it will grow indefinitely as old items won't be removed.
我的想法是根据每个项目的唯一 ID 为每个项目创建一个目录,但是我在主上传目录中仍然有 20000 个目录,并且由于旧项目不会被删除,它会无限增长。
Thanks for any help.
谢谢你的帮助。
采纳答案by bdonlan
ext[234] filesystems have a fixed maximum number of inodes; every file or directory requires one inode. You can see the current count and limits with df -i
. For example, on a 15GB ext3 filesystem, created with the default settings:
ext[234] 文件系统具有固定的最大索引节点数;每个文件或目录都需要一个 inode。您可以使用 来查看当前计数和限制df -i
。例如,在使用默认设置创建的 15GB ext3 文件系统上:
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/xvda 1933312 134815 1798497 7% /
There's no limit on directories in particular beyond this; keep in mind that every file or directory requires at least one filesystem block (typically 4KB), though, even if it's a directory with only a single item in it.
除此之外,对目录没有任何限制;请记住,每个文件或目录都至少需要一个文件系统块(通常为 4KB),即使它是一个只有一个项目的目录。
As you can see, though, 80,000 inodes is unlikely to be a problem. And with the dir_index
option (enablable with tune2fs
), lookups in large directories aren't too much of a big deal. However, note that many administrative tools (such as ls
or rm
) can have a hard time dealing with directories with too many files in them. As such, it's recommended to split your files up so that you don't have more than a few hundred to a thousand items in any given directory. An easy way to do this is to hash whatever ID you're using, and use the first few hex digits as intermediate directories.
但是,正如您所看到的,80,000 个 inode 不太可能成为问题。使用dir_index
选项(启用tune2fs
),大型目录中的查找并不是什么大问题。但是,请注意,许多管理工具(例如ls
或rm
)可能很难处理包含过多文件的目录。因此,建议将文件拆分,以便在任何给定目录中不会有超过几百到一千个项目。一个简单的方法是对您使用的任何 ID 进行哈希处理,并使用前几个十六进制数字作为中间目录。
For example, say you have item ID 12345, and it hashes to 'DEADBEEF02842.......'
. You might store your files under /storage/root/d/e/12345
. You've now cut the number of files in each directory by 1/256th.
例如,假设您的项目 ID 为 12345,并且它的哈希值为'DEADBEEF02842.......'
。您可以将文件存储在/storage/root/d/e/12345
. 您现在已将每个目录中的文件数量减少了 1/256。
回答by sarnold
If your server's filesystem has the dir_index
feature turned on (see tune2fs(8)
for details on checking and turning on the feature) then you can reasonably store upwards of 100,000 files in a directory before the performance degrades. (dir_index
has been the default for new filesystems for most of the distributions for several years now, so it would only be an oldfilesystem that doesn't have the feature on by default.)
如果您服务器的文件系统dir_index
启用了该功能(tune2fs(8)
有关检查和启用该功能的详细信息,请参阅有关检查和启用该功能的详细信息),那么在性能下降之前,您可以合理地在一个目录中存储 100,000 个以上的文件。(dir_index
多年来,大多数发行版一直是新文件系统的默认设置,因此它只会是默认情况下没有该功能的旧文件系统。)
That said, adding another directory level to reduce the number of files in a directory by a factor of 16 or 256 would drastically improve the chances of things like ls *
working without over-running the kernel's maximum argv
size.
也就是说,添加另一个目录级别以将目录中的文件数量减少 16 或 256 倍将大大提高诸如ls *
在不超过内核最大argv
大小的情况下工作的机会。
Typically, this is done by something like:
通常,这是通过以下方式完成的:
/a/a1111
/a/a1112
...
/b/b1111
...
/c/c6565
...
i.e., prepending a letter or digit to the path, based on some feature you can compute off the name. (The first two characters of md5sum
or sha1sum
of the file name is one common approach, but if you have unique object ids, then 'a'+ id % 16
is easy enough mechanism to determine which directory to use.)
即,根据您可以计算出名称的某些功能,在路径前添加一个字母或数字。(文件名的前两个字符md5sum
或的前两个字符sha1sum
是一种常见的方法,但如果您有唯一的对象 ID,那么'a'+ id % 16
确定使用哪个目录的机制就足够容易了。)
回答by glglgl
60000 is nothing, 20000 as well. But you should put group these 20000 by any means in order to speed up access to them. Maybe in groups of 100 or 1000, by taking the number of the directory and dividing it by 100, 500, 1000, whatever.
60000不算什么,20000也是。但是您应该以任何方式将这 20000 个分组,以加快对它们的访问。也许以 100 或 1000 为一组,取目录的编号并将其除以 100、500、1000,等等。
E.g., I have a project where the files have numbers. I group them in 1000s, so I have
例如,我有一个项目,其中文件有数字。我将它们分组为 1000 秒,所以我有
id/1/1332
id/3/3256
id/12/12334
id/350/350934
You actually might have a hard limit - some systems have 32 bit inodes, so you are limited to a number of 2^32 per file system.
您实际上可能有一个硬性限制——某些系统有 32 位 inode,因此每个文件系统的数量限制为 2^32。
回答by Basile Starynkevitch
In addition of the general answers (basically "don't bother that much", and "tune your filesystem", and "organize your directory with subdirectories containing a few thousand files each"):
除了一般的答案(基本上“不要那么麻烦”,“调整你的文件系统”,“用包含几千个文件的子目录来组织你的目录”):
If the individual images are small (e.g. less than a few kilobytes), instead of putting them in a folder, you could also put them in a database (e.g. with MySQL as a BLOB) or perhaps inside a GDBMindexed file. Then each small item won't consume an inode (on many filesystems, each inode wants at least some kilobytes). You could also do that for some threshold (e.g. put images bigger than 4kbytes in individual files, and smaller ones in a data base or GDBM file). Of course, don't forget to backup your data (and define a backup stategy).
如果单个图像很小(例如小于几千字节),您也可以将它们放在数据库中(例如将 MySQL 作为BLOB)或GDBM索引文件中,而不是将它们放在文件夹中。然后每个小项目都不会消耗一个 inode(在许多文件系统上,每个 inode 至少需要一些 KB)。您也可以针对某些阈值执行此操作(例如,将大于 4kbyte 的图像放入单个文件中,将较小的图像放入数据库或 GDBM 文件中)。当然,不要忘记备份您的数据(并定义备份策略)。
回答by Abhishek Dujari
The year is 2014. I come back in time to add this answer. Lots of big/small files? You can use Amazon S3 and other alternatives based on Ceph like DreamObjects, where there are no directory limits to worry about.
今年是2014年,我及时回来补充这个答案。很多大/小文件?您可以使用 Amazon S3 和其他基于 Ceph 的替代方案,例如 DreamObjects,无需担心目录限制。
I hope this helps someone decide from all the alternatives.
我希望这有助于某人从所有选择中做出决定。
回答by gibz
md5($id) ==> 0123456789ABCDEF
$file_path = items/012/345/678/9AB/CDE/F.jpg
1 node = 4096 subnodes (fast)