C# 获取目录大小的更有效方法

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/9831641/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-09 10:49:30  来源:igfitidea点击:

More efficient method of getting Directory size

c#searchrecursiondirectory

提问by ikathegreat

I've already build a recursive function to get the directory size of a folder path. It works, however with the growing number of directories I have to search through (and number of files in each respective folder), this is a very slow, inefficient method.

我已经构建了一个递归函数来获取文件夹路径的目录大小。它可以工作,但是随着我必须搜索的目录数量(以及每个相应文件夹中的文件数量)不断增加,这是一种非常缓慢、低效的方法。

static string GetDirectorySize(string parentDir)
{
    long totalFileSize = 0;

    string[] dirFiles = Directory.GetFiles(parentDir, "*.*", 
                            System.IO.SearchOption.AllDirectories);

    foreach (string fileName in dirFiles)
    {
        // Use FileInfo to get length of each file.
        FileInfo info = new FileInfo(fileName);
        totalFileSize = totalFileSize + info.Length;
    }
    return String.Format(new FileSizeFormatProvider(), "{0:fs}", totalFileSize);
}

This is searches all subdirectories for the argument path, so the dirFilesarray gets quite large. Is there a better method to accomplish this? I've searched around but haven't found anything yet.

这是搜索参数路径的所有子目录,因此dirFiles数组变得非常大。有没有更好的方法来实现这一点?我已经四处搜索,但还没有找到任何东西。

Another idea that crossed my mind was putting the results in a cache and when the function is called again, try and find the differences and only re-search folders that have changed. Not sure if that's a good thing either...

我想到的另一个想法是将结果放入缓存中,当再次调用该函数时,尝试找出差异并仅重新搜索已更改的文件夹。不确定这是否是一件好事......

采纳答案by usr

You are first scanning the tree to get a list of all files. Then you are reopening every file to get its size. This amounts to scanning twice.

您首先扫描树以获取所有文件的列表。然后您重新打开每个文件以获取其大小。这相当于扫描两次。

I suggest you use DirectoryInfo.GetFileswhich will hand you FileInfoobjects directly. These objects are pre-filled with their length.

我建议您使用DirectoryInfo.GetFileswhich 将FileInfo直接交给您对象。这些对象预先填充了它们的长度。

In .NET 4you can also use the EnumerateFilesmethod which will return you a lazy IEnumable.

.NET 4 中,您还可以使用EnumerateFiles将返回惰性IEnumable.

回答by Adriano Repetti

You may start to speed up a little bit your function using EnumerateFiles()instead of GetFiles(). At least you won't load the full list in memory.

您可能会开始使用EnumerateFiles()代替 来稍微加快您的功能GetFiles()。至少你不会在内存中加载完整的列表。

If it's not enough you should make your function morecomplex using threads (one thread per directory is too much but there is not a general rule).
You may use a fixed number of threads that peeks directories from a queue, each thread calculates the size of a directory and adds to the total. Something like:

如果这还不够,您应该使用线程使您的函数复杂(每个目录一个线程太多了,但没有一般规则)。
您可以使用固定数量的线程从队列中查看目录,每个线程计算目录的大小并添加到总数中。就像是:

  • Get the list of all directories (not files).
  • Create N threads (one per core, for example).
  • Each thread peeks a directory and calculate the size.
  • If there is not another directory in the queue the thread ends.
  • If there is a directory in the queue it calculates its size and so on.
  • Function finishes when all threads terminate.
  • 获取所有目录(不是文件)的列表。
  • 创建 N 个线程(例如,每个内核一个)。
  • 每个线程查看一个目录并计算大小。
  • 如果队列中没有另一个目录,则线程结束。
  • 如果队列中有一个目录,它会计算它的大小等等。
  • 当所有线程终止时,函数完成。

You may improve a lot the algorithm spanning the search of directories across all threads (for example when a thread parse a directory it adds folders to the queue). Up to you to make it more complicated if you see it's too slow (this task has been used by Microsoft as an example for the newTask Parallel Library).

您可以大大改进跨越所有线程搜索目录的算法(例如,当一个线程解析一个目录时,它会将文件夹添加到队列中)。如果您发现它太慢,则由您决定使其更复杂(Microsoft 已将此任务用作任务并行库的示例)。

回答by MrFox

This is more cryptic but it took about 2 seconds for 10k executions.

这更神秘,但执行 10k 次大约需要 2 秒。

    public static long GetDirectorySize(string parentDirectory)
    {
        return new DirectoryInfo(parentDirectory).GetFiles("*.*", SearchOption.AllDirectories).Sum(file => file.Length);
    }

回答by paparazzo

Try

尝试

        DirectoryInfo DirInfo = new DirectoryInfo(@"C:\DataLoad\");
        Stopwatch sw = new Stopwatch();
        try
        {
            sw.Start();
            Int64 ttl = 0;
            Int32 fileCount = 0;
            foreach (FileInfo fi in DirInfo.EnumerateFiles("*", SearchOption.AllDirectories))
            {
                ttl += fi.Length;
                fileCount++;
            }
            sw.Stop();
            Debug.WriteLine(sw.ElapsedMilliseconds.ToString() + " " + fileCount.ToString());
        }
        catch (Exception Ex)
        {
            Debug.WriteLine(Ex.ToString());
        }

This did 700,000 in 70 seconds on desktop NON-RAID P4. So like 10,000 a second. On server class machine should get 100,000+ / second easy.

这在桌面非 RAID P4 上在 70 秒内完成了 700,000。所以就像每秒 10,000。在服务器级机器上应该可以轻松获得 100,000+/秒。

As usr (+1) said EnumerateFile is pre-filled with length.

正如 usr (+1) 所说, EnumerateFile 预先填充了长度。

回答by Shai Segev

long length = Directory.GetFiles(@"MainFolderPath", "*", SearchOption.AllDirectories).Sum(t => (new FileInfo(t).Length));