在 Linux 中存储和访问多达 1000 万个文件
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/5019371/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Storing & accessing up to 10 million files in Linux
提问by Matt
I'm writing an app that needs to store lots of files up to approx 10 million.
我正在编写一个需要存储多达约 1000 万个文件的应用程序。
They are presently named with a UUID and are going to be around 4MB each but always the same size. Reading and writing from/to these files will always be sequential.
它们目前以 UUID 命名,每个大约 4MB,但大小始终相同。从/向这些文件读取和写入将始终是顺序的。
2 main questions I am seeking answers for:
我正在寻求答案的 2 个主要问题:
1) Which filesystem would be best for this. XFS or ext4? 2) Would it be necessary to store the files beneath subdirectories in order to reduce the numbers of files within a single directory?
1)哪个文件系统最适合这个。XFS 还是 ext4?2)是否有必要将文件存储在子目录下以减少单个目录中的文件数量?
For question 2, I note that people have attempted to discover the XFS limit for number of files you can store in a single directory and haven't found the limit which exceeds millions. They noted no performance problems. What about under ext4?
对于问题 2,我注意到人们试图发现 XFS 限制您可以存储在单个目录中的文件数,但没有发现超过数百万的限制。他们注意到没有性能问题。在ext4下呢?
Googling around with people doing similar things, some people suggested storing the inode number as a link to the file instead of the filename for performance (this is in a database index. which I'm also using). However, I don't see a usable API for opening the file by inode number. That seemed to be more of a suggestion for improving performance under ext3 which I am not intending to use by the way.
谷歌搜索与人们做类似的事情,有些人建议将 inode 编号存储为文件的链接而不是文件名以提高性能(这是在数据库索引中。我也在使用)。但是,我没有看到用于按 inode 编号打开文件的可用 API。这似乎更像是在 ext3 下提高性能的建议,我不打算顺便使用它。
What are the ext4 and XFS limits? What performance benefits are there from one over the other and could you see a reason to use ext4 over XFS in my case?
ext4 和 XFS 的限制是什么?两者之间有什么性能优势,在我的情况下,您能看到在 XFS 上使用 ext4 的理由吗?
采纳答案by Zan Lynx
You should definitely store the files in subdirectories.
您绝对应该将文件存储在子目录中。
EXT4 and XFS both use efficient lookup methods for file names, but if you ever need to run tools over the directories such as ls
or find
you will be very glad to have the files in manageable chunks of 1,000 - 10,000 files.
EXT4 和 XFS 都使用有效的文件名查找方法,但是如果您需要在目录上运行工具,例如ls
或find
您会很高兴将文件放在可管理的 1,000 - 10,000 个文件块中。
The inode number thing is to improve the sequential access performance of the EXT filesystems. The metadata is stored in inodes and if you access these inodes out of order then the metadata accesses are randomized. By reading your files in inode order you make the metadata access sequential too.
inode number 是为了提高 EXT 文件系统的顺序访问性能。元数据存储在 inode 中,如果您不按顺序访问这些 inode,则元数据访问是随机的。通过按 inode 顺序读取文件,您也可以按顺序访问元数据。
回答by MarkR
Modern filesystems will let you store 10 million files all in the same directory if you like. But tools (ls and its friends) will not work well.
如果您愿意,现代文件系统可以让您将 1000 万个文件全部存储在同一目录中。但是工具(ls 和它的朋友)不会很好地工作。
I'd recommend putting a single level of directories, a fixed number, perhaps 1,000 directories, and putting the files in there (10,000 files is tolerable to the shell, and "ls").
我建议放置一个单一级别的目录,一个固定的数量,可能是 1,000 个目录,然后将文件放在那里(shell 可以容忍 10,000 个文件,以及“ls”)。
I've seen systems which create many levels of directories, this is truly unnecessary and increases inode consumption and makes traversal slower.
我见过创建多级目录的系统,这确实是不必要的,并且会增加 inode 消耗并使遍历速度变慢。
10M files should not really be a problem either, unless you need to do bulk operations on them.
10M 文件也应该不是问题,除非您需要对它们进行批量操作。
I expect you will need to prune old files, but something like "tmpwatch" will probably work just fine with 10M files.
我希望您需要修剪旧文件,但是像“tmpwatch”这样的东西可能适用于 10M 文件。