windows NTFS 性能和大量文件和目录

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/197162/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-09 05:33:03  来源:igfitidea点击:

NTFS performance and large volumes of files and directories

windowsperformancefilesystemsntfs

提问by James Newton-King

How does Windows with NTFS perform with large volumes of files and directories?

带有 NTFS 的 Windows 如何处理大量文件和目录?

Is there any guidance around limits of files or directories you can place in a single directory before you run into performance problems or other issues?

在遇到性能问题或其他问题之前,是否有关于可以放置在单个目录中的文件或目录限制的任何指导?

E.g. is having a folder with 100,000 folders inside of it an OK thing to do?

例如,有一个包含 100,000 个文件夹的文件夹可以吗?

回答by MrB

Here's some advice from someone with an environment where we have folders containing tens of millions of files.

以下是来自我们拥有包含数千万个文件的文件夹的环境的人的一些建议。

  1. A folder stores the index information (links to child files & child folder) in an index file. This file will get very large when you have a lot of children. Note that it doesn't distinguish between a child that's a folder and a child that's a file. The only difference really is the content of that child is either the child's folder index or the child's file data. Note: I am simplifying this somewhat but this gets the point across.
  2. The index file will get fragmented. When it gets too fragmented, you will be unable to add files to that folder. This is because there is a limit on the # of fragments that's allowed. It's by design. I've confirmed it with Microsoft in a support incident call. So although the theoretical limit to the number of files that you can have in a folder is several billions, good luck when you start hitting tens of million of files as you will hit the fragmentation limitation first.
  3. It's not all bad however. You can use the tool: contig.exeto defragment this index. It will not reduce the size of the index (which can reach up to several Gigs for tens of million of files) but you can reduce the # of fragments. Note: The Disk Defragment tool will NOT defrag the folder's index. It will defrag file data. Only the contig.exe tool will defrag the index. FYI: You can also use that to defrag an individual file's data.
  4. If you DO defrag, don't wait until you hit the max # of fragment limit. I have a folder where I cannot defrag because I've waited until it's too late. My next test is to try to move some files out of that folder into another folder to see if I could defrag it then. If this fails, then what I would have to do is 1) create a new folder. 2) move a batch of files to the new folder. 3) defrag the new folder. repeat #2 & #3 until this is done and then 4) remove the old folder and rename the new folder to match the old.
  1. 文件夹将索引信息(链接到子文件和子文件夹)存储在索引文件中。当你有很多孩子时,这个文件会变得非常大。请注意,它不区分作为文件夹的子项和作为文件的子项。唯一的区别实际上是该子项的内容是子项的文件夹索引或子项的文件数据。注意:我在某种程度上简化了这一点,但这说明了这一点。
  2. 索引文件将变得碎片化。当它变得过于分散时,您将无法将文件添加到该文件夹​​中。这是因为允许的片段数量是有限制的。这是设计使然。我已经在支持事件电话中与 Microsoft 确认了这一点。因此,尽管一个文件夹中可以包含的文件数量的理论限制是数十亿,但是当您开始处理数千万个文件时,祝您好运,因为您将首先达到碎片限制。
  3. 然而,也不全是坏事。您可以使用工具:contig.exe对该索引进行碎片整理。它不会减少索引的大小(对于数千万个文件最多可以达到几个 Gig),但您可以减少片段的数量。注意:磁盘碎片整理工具不会对文件夹的索引进行碎片整理。它将对文件数据进行碎片整理。只有 contig.exe 工具会对索引进行碎片整理。仅供参考:您还可以使用它来对单个文件的数据进行碎片整理。
  4. 如果您进行碎片整理,请不要等到达到最大碎片数量限制。我有一个文件夹无法进行碎片整理,因为我一直等到为时已晚。我的下一个测试是尝试将一些文件从该文件夹中移到另一个文件夹中,看看我是否可以对其进行碎片整理。如果失败,那么我要做的是 1) 创建一个新文件夹。2) 将一批文件移动到新文件夹中。3) 对新文件夹进行碎片整理。重复 #2 & #3 直到完成,然后 4) 删除旧文件夹并重命名新文件夹以匹配旧文件夹。

To answer your question more directly: If you're looking at 100K entries, no worries. Go knock yourself out. If you're looking at tens of millions of entries, then either:

更直接地回答您的问题:如果您查看 10 万个条目,不用担心。去自首吧。如果您正在查看数千万个条目,则可以:

a) Make plans to sub-divide them into sub-folders (e.g., lets say you have 100M files. It's better to store them in 1000 folders so that you only have 100,000 files per folder than to store them into 1 big folder. This will create 1000 folder indices instead of a single big one that's more likely to hit the max # of fragments limit or

a) 计划将它们细分为子文件夹(例如,假设您有 1 亿个文件。最好将它们存储在 1000 个文件夹中,这样每个文件夹中只有 100,000 个文件,而不是将它们存储在一个大文件夹中。这将创建 1000 个文件夹索引,而不是一个更可能达到最大片段数限制或

b) Make plans to run contig.exe on a regular basis to keep your big folder's index defragmented.

b) 计划定期运行 contig.exe 以保持大文件夹的索引碎片整理。

Read below only if you're bored.

仅当您感到无聊时才阅读以下内容。

The actual limit isn't on the # of fragment, but on the number of records of the data segment that stores the pointers to the fragment.

实际的限制不在于片段的数量,而在于存储指向片段的指针的数据段的记录数。

So what you have is a data segment that stores pointers to the fragments of the directory data. The directory data stores information about the sub-directories & sub-files that the directory supposedly stored. Actually, a directory doesn't "store" anything. It's just a tracking and presentation feature that presents the illusion of hierarchy to the user since the storage medium itself is linear.

因此,您拥有的是一个数据段,用于存储指向目录数据片段的指针。目录数据存储有关目录应该存储的子目录和子文件的信息。实际上,目录并不“存储”任何东西。由于存储介质本身是线性的,因此它只是一个跟踪和呈现功能,向用户呈现层次结构的错觉。

回答by Tony Lee

There are also performance problems with short file name creation slowing things down. Microsoft recommends turning off short filename creation if you have more than 300k files in a folder [1]. The less unique the first 6 characters are, the more of a problem this is.

创建短文件名也存在性能问题,这会减慢速度。如果文件夹中的文件超过 300k,Microsoft 建议关闭短文件名创建 [1]。前 6 个字符越不唯一,问题就越大。

[1] How NTFS Worksfrom http://technet.microsoft.com, search for "300,000"

[1]如何NTFS作品http://technet.microsoft.com,搜索“年30”

回答by Spoc

I am building a File-Structure to host up to 2 billion (2^32) files and performed the following tests that show a sharp drop in Navigate + Read Performance at about 250 Files or 120 Directories per NTFS Directory on a Solid State Drive (SSD):

我正在构建一个文件结构来承载多达 20 亿 (2^32) 个文件,并执行了以下测试,这些测试表明,在固态驱动器上每个 NTFS 目录大约有 250 个文件或 120 个目录时,导航 + 读取性能急剧下降(固态硬盘):

  • The File Performance drops by 50% between 250 and 1000 Files.
  • The Directory Performance drops by 60% between 120 and 1000 Directories.
  • Values for Numbers > 1000 remain relatively stable
  • 文件性能在 250 到 1000 个文件之间下降 50%。
  • 目录性能在 120 到 1000 个目录之间下降 60%。
  • Numbers > 1000 的值保持相对稳定

Interestingly the Number of Directories and Files do NOT significantly interfere.

有趣的是,目录和文件的数量不会显着干扰。

So the Lessons are:

所以教训是:

  • File Numbers above 250 cost a Factor of 2
  • Directories above 120 cost a Factor of 2.5
  • The File-Explorer in Windows 7 can handle large #Files or #Dirs, but Usability is still bad.
  • Introducing Sub-Directories is not expensive
  • 超过 250 的文件编号成本为 2
  • 大于 120 的目录成本因子为 2.5
  • Windows 7 中的 File-Explorer 可以处理大的 #Files 或 #Dirs,但可用性仍然很差。
  • 引入子目录并不昂贵

This is the Data (2 Measurements for each File and Directory):

这是数据(每个文件和目录的 2 个测量值):

(FOPS = File Operations per Second)
(DOPS = Directory Operations per Second)

#Files  lg(#)   FOPS    FOPS2   DOPS    DOPS2
   10   1.00    16692   16692   16421   16312
  100   2.00    16425   15943   15738   16031
  120   2.08    15716   16024   15878   16122
  130   2.11    15883   16124   14328   14347
  160   2.20    15978   16184   11325   11128
  200   2.30    16364   16052   9866    9678
  210   2.32    16143   15977   9348    9547
  220   2.34    16290   15909   9094    9038
  230   2.36    16048   15930   9010    9094
  240   2.38    15096   15725   8654    9143
  250   2.40    15453   15548   8872    8472
  260   2.41    14454   15053   8577    8720
  300   2.48    12565   13245   8368    8361
  400   2.60    11159   11462   7671    7574
  500   2.70    10536   10560   7149    7331
 1000   3.00    9092    9509    6569    6693
 2000   3.30    8797    8810    6375    6292
10000   4.00    8084    8228    6210    6194
20000   4.30    8049    8343    5536    6100
50000   4.70    7468    7607    5364    5365

And this is the Test Code:

这是测试代码:

[TestCase(50000, false, Result = 50000)]
[TestCase(50000, true, Result = 50000)]
public static int TestDirPerformance(int numFilesInDir, bool testDirs) {
    var files = new List<string>();
    var dir = Path.GetTempPath() + "\Sub\" + Guid.NewGuid() + "\";
    Directory.CreateDirectory(dir);
    Console.WriteLine("prepare...");
    const string FILE_NAME = "\file.txt";
    for (int i = 0; i < numFilesInDir; i++) {
        string filename = dir + Guid.NewGuid();
        if (testDirs) {
            var dirName = filename + "D";
            Directory.CreateDirectory(dirName);
            using (File.Create(dirName + FILE_NAME)) { }
        } else {
            using (File.Create(filename)) { }
        }
        files.Add(filename);
    }
    //Adding 1000 Directories didn't change File Performance
    /*for (int i = 0; i < 1000; i++) {
        string filename = dir + Guid.NewGuid();
        Directory.CreateDirectory(filename + "D");
    }*/
    Console.WriteLine("measure...");
    var r = new Random();
    var sw = new Stopwatch();
    sw.Start();
    int len = 0;
    int count = 0;
    while (sw.ElapsedMilliseconds < 5000) {
        string filename = files[r.Next(files.Count)];
        string text = File.ReadAllText(testDirs ? filename + "D" + FILE_NAME : filename);
        len += text.Length;
        count++;
    }
    Console.WriteLine("{0} File Ops/sec ", count / 5);
    return numFilesInDir; 
}

回答by Oli

100,000 should be fine.

100,000应该没问题。

I have (anecdotally) seen people having problems with many millions of files and I have had problems myself with Explorer just not having a clue how to count past 60-something thousand files, but NTFS should be good for the volumes you're talking.

我(轶事地)看到人们在处理数百万个文件时遇到问题,而我自己也遇到过使用 Explorer 的问题,只是不知道如何计算超过 60 到 1000 个文件,但 NTFS 应该适合您正在谈论的数量。

In case you're wondering, the technical (and I hope theoretical) maximum number of files is: 4,294,967,295

如果您想知道,技术(我希望理论上)文件的最大数量是:4,294,967,295

回答by Brian Knoblauch

For local access, large numbers of directories/files doesn't seem to be an issue. However, if you're accessing it across a network, there's a noticeable performance hit after a few hundred (especially when accessed from Vista machines (XP to Windows Server w/NTFS seemed to run much faster in that regard)).

对于本地访问,大量目录/文件似乎不是问题。但是,如果您通过网络访问它,在几百次之后性能会受到明显影响(尤其是从 Vista 机器访问时(XP 到 Windows Server w/NTFS 在这方面的运行速度似乎要快得多))。

回答by Constantin

When you create a folder with N entries, you create a list of N items at file-system level. This list is a system-wide shared data structure. If you then start modifying this list continuously by adding/removing entries, I expect at least some lock contention over shared data. This contention - theoretically- can negatively affect performance.

当您创建一个包含 N 个条目的文件夹时,您将在文件系统级别创建一个包含 N 个项目的列表。这个列表是一个系统范围的共享数据结构。如果您随后通过添加/删除条目开始不断修改此列表,我预计至少会出现一些对共享数据的锁争用。这种争用——理论上——会对性能产生负面影响。

For read-only scenarios I can't imagine any reason for performance degradation of directories with large number of entries.

对于只读场景,我无法想象具有大量条目的目录性能下降的任何原因。

回答by ximik

I had real experience with about 100 000 files (each several MBs) on NTFS in a directory while copying one online library.

在复制一个在线库时,我在一个目录中的 NTFS 上有大约 100 000 个文件(每个几个 MB)的真实经验。

It takes about 15 minutes to open the directory with Explorer or 7-zip.

使用资源管理器或 7-zip 打开目录大约需要 15 分钟。

Writing site copy with winhttrackwill always get stuck after some time. It dealt also with directory, containing about 1 000 000 files. I think the worst thing is that the MFT can only by traversed sequentially.

使用 编写站点副本winhttrack一段时间后总是会卡住。它还处理包含大约 1 000 000 个文件的目录。我认为最糟糕的是 MFT 只能按顺序遍历。

Opening the same under ext2fsd on ext3 gave almost the same timing. Probably moving to reiserfs (not reiser4fs) can help.

在 ext3 上的 ext2fsd 下打开相同的时间几乎相同。迁移到 reiserfs(不是 reiser4fs)可能会有所帮助。

Trying to avoid this situation is probably the best.

尽量避免这种情况可能是最好的。

For your own programs using blobs w/o any fs could be beneficial. That's the way Facebook does for storing photos.

对于您自己的程序,使用 blob 没有任何 fs 可能是有益的。这就是 Facebook 存储照片的方式。