在 C# 中在超过 20,000 个文件的目录中查找文件的最快方法

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/714101/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-04 22:12:44  来源:igfitidea点击:

Quickest way in C# to find a file in a directory with over 20,000 files

c#.netfile-io

提问by adeel825

I have a job that runs every night to pull xml files from a directory that has over 20,000 subfolders under the root. Here is what the structure looks like:

我有一个每天晚上运行的工作,从根目录下有超过 20,000 个子文件夹的目录中提取 xml 文件。这是结构的样子:

rootFolder/someFolder/someSubFolder/xml/myFile.xml
rootFolder/someFolder/someSubFolder1/xml/myFile1.xml
rootFolder/someFolder/someSubFolderN/xml/myFile2.xml
rootFolder/someFolder1
rootFolder/someFolderN

So looking at the above, the structure is always the same - a root folder, then two subfolders, then an xml directory, and then the xml file. Only the name of the rootFolder and the xml directory are known to me.

所以看看上面,结构总是一样的——一个根文件夹,然后是两个子文件夹,然后是一个 xml 目录,然后是 xml 文件。我只知道 rootFolder 和 xml 目录的名称。

The code below traverses through all the directories and is extremely slow. Any recommendations on how I can optimize the search especially if the directory structure is known?

下面的代码遍历所有目录并且非常慢。关于如何优化搜索的任何建议,尤其是在目录结构已知的情况下?

string[] files = Directory.GetFiles(@"\somenetworkpath\rootFolder", "*.xml", SearchOption.AllDirectories);

采纳答案by Mitchel Sellers

Rather than doing GetFiles and doing a brute force search you could most likely use GetDirectories, first to get a list of the "First sub folder", loop through those directories, then repeat the process for the sub folder, looping through them, lastly look for the xml folder, and finally searching for .xml files.

与其执行 GetFiles 并进行蛮力搜索,您最有可能使用 GetDirectories,首先获取“第一个子文件夹”的列表,遍历这些目录,然后对子文件夹重复该过程,遍历它们,最后查看对于 xml 文件夹,最后搜索 .xml 文件。

Now, as for performance the speed of this will vary, but searching for directories first, THEN getting to files should help a lot!

现在,至于性能,速度会有所不同,但是首先搜索目录,然后访问文件应该会有很大帮助!

Update

更新

Ok, I did a quick bit of testing and you can actually optimize it much further than I thought.

好的,我做了一些快速的测试,你实际上可以比我想象的更进一步优化它。

The following code snippet will search a directory structure and find ALL "xml" folders inside the entire directory tree.

以下代码片段将搜索目录结构并在整个目录树中查找所有“xml”文件夹。

string startPath = @"C:\Testing\Testing\bin\Debug";
string[] oDirectories = Directory.GetDirectories(startPath, "xml", SearchOption.AllDirectories);
Console.WriteLine(oDirectories.Length.ToString());
foreach (string oCurrent in oDirectories)
    Console.WriteLine(oCurrent);
Console.ReadLine();

If you drop that into a test console app you will see it output the results.

如果您将其放入测试控制台应用程序中,您将看到它输出结果。

Now, once you have this, just look in each of the found directories for you .xml files.

现在,一旦你有了这个,只需在每个找到的目录中查找 .xml 文件。

回答by Adam Robinson

Are there additional directories at the same level as the xml folder? If so, you could probably speed up the search if you do it yourself and eliminate that level from searching.

是否有与 xml 文件夹处于同一级别的其他目录?如果是这样,如果您自己进行搜索并从搜索中消除该级别,则可能会加快搜索速度。

        System.IO.DirectoryInfo root = new System.IO.DirectoryInfo(rootPath);
        List<System.IO.FileInfo> xmlFiles=new List<System.IO.FileInfo>();

        foreach (System.IO.DirectoryInfo subDir1 in root.GetDirectories())
        {
            foreach (System.IO.DirectoryInfo subDir2 in subDir1.GetDirectories())
            {
                System.IO.DirectoryInfo xmlDir = new System.IO.DirectoryInfo(System.IO.Path.Combine(subDir2.FullName, "xml"));

                if (xmlDir.Exists)
                {
                    xmlFiles.AddRange(xmlDir.GetFiles("*.xml"));
                }
            }
        }

回答by Chris Doggett

I can't think of anything faster in C#, but do you have indexing turned on for that file system?

我想不出在 C# 中有什么更快的方法,但是您是否为该文件系统打开了索引?

回答by Michael

Only way I can see that would make much difference is to change from a brute strength hunt and use some third party or OS indexing routine to speed the return. that way the search is done off line from your app.

我认为这会产生很大不同的唯一方法是从蛮力搜索中改变并使用某些第三方或操作系统索引例程来加速返回。这样搜索就从您的应用程序离线完成。

But I would also suggest you should look at better ways to structure that data if at all possible.

但我也建议您尽可能寻找更好的方法来构建数据。

回答by Richard

Use P/Invoke on FindFirstFile/FindNextFile/FindCloseand avoid overhead of creating lots of FileInfo instances.

使用的P / Invoke上FindFirstFile/ FindNextFile/FindClose创建大量的FileInfo实例,避免开销。

But this will be hard work to get right (you will have to do all the handling of file vs. directory and recursion yourself). So try something simple (Directory.GetFiles(), Directory.GetDirectories()) to start with and get things working. If it is too slow look at alternatives (but always measure, too easy to make it slower).

但这将是一项艰巨的工作(您必须自己完成文件与目录和递归的所有处理)。所以尝试一些简单的事情(Directory.GetFiles()、Directory.GetDirectories())来开始并让事情工作。如果速度太慢,请查看替代方案(但始终进行测量,太容易使其变慢)。

回答by ceffoh

I created a recursive method GetFoldersusing a Parallel.ForEachto find all the folders named as the variable yourKeyword

GetFolders使用 a创建了一个递归方法Parallel.ForEach来查找所有命名为变量的文件夹yourKeyword

List<string> returnFolders = new List<string>();
object locker = new object();

Parallel.ForEach(subFolders, subFolder =>
{
    if (subFolder.ToUpper().EndsWith(yourKeyword))
    {
        lock (locker)
        {
            returnFolders.Add(subFolder);
        }
    }
    else
    {
        lock (locker)
        {
            returnFolders.AddRange(GetFolders(Directory.GetDirectories(subFolder)));
        }
    }
});

return returnFolders;

回答by Henrik Gering

Depending on your needs and configuration, you could utilize the Windows Search Index: https://msdn.microsoft.com/en-us/library/windows/desktop/bb266517(v=vs.85).aspx

根据您的需要和配置,您可以使用 Windows 搜索索引:https: //msdn.microsoft.com/en-us/library/windows/desktop/bb266517(v=vs.85).aspx

Depending on your configuration this could increase performance greatly.

根据您的配置,这可以大大提高性能。

回答by VladVS

For file and directory search purpose I would want to offer use multithreading .NET library that possess a wide search opportunities. All information about library you can find on GitHub: https://github.com/VladPVS/FastSearchLibraryIf you want to download it you can do it here: https://github.com/VladPVS/FastSearchLibrary/releasesIf you have any questions please ask them.

出于文件和目录搜索的目的,我想提供具有广泛搜索机会的多线程 .NET 库。你可以在 GitHub 上找到关于图书馆的所有信息:https: //github.com/VladPVS/FastSearchLibrary如果你想下载它,你可以在这里下载:https: //github.com/VladPVS/FastSearchLibrary/releases如果你有任何有问题请向他们提问。

Works really fast. Check it yourself!

工作真的很快。自己检查一下!

It is one demonstrative example how you can use it:

这是一个如何使用它的示范示例:

class Searcher
{
    private static object locker = new object(); 

    private FileSearcher searcher;

    List<FileInfo> files;

    public Searcher()
    {
        files = new List<FileInfo>();
    }

    public void Startsearch()
    {
        CancellationTokenSource tokenSource = new CancellationTokenSource();

        searcher = new FileSearcher(@"C:\", (f) =>
        {
            return Regex.IsMatch(f.Name, @".*[Dd]ragon.*.jpg$");
        }, tokenSource);  


        searcher.FilesFound += (sender, arg) => 
        {
            lock (locker) // using a lock is obligatorily
            {
                arg.Files.ForEach((f) =>
                {
                    files.Add(f);
                    Console.WriteLine($"File location: {f.FullName}, \nCreation.Time: {f.CreationTime}");
                });

                if (files.Count >= 10) 
                    searcher.StopSearch();
            }
        };

        searcher.SearchCompleted += (sender, arg) => 
        {
            if (arg.IsCanceled) 
                Console.WriteLine("Search stopped.");
            else
                Console.WriteLine("Search completed.");

            Console.WriteLine($"Quantity of files: {files.Count}"); 
        };

        searcher.StartSearchAsync();
    }
}

It's part of other example:

这是其他示例的一部分:

***
List<string> folders = new List<string>
{
  @"C:\Users\Public",
  @"C:\Windows\System32",
  @"D:\Program Files",
  @"D:\Program Files (x86)"
}; // list of search directories

List<string> keywords = new List<string> { "word1", "word2", "word3" }; // list of search keywords

FileSearcherMultiple multipleSearcher = new FileSearcherMultiple(folders, (f) =>
{
  if (f.CreationTime >= new DateTime(2015, 3, 15) &&
     (f.Extension == ".cs" || f.Extension == ".sln"))
    foreach (var keyword in keywords)
      if (f.Name.Contains(keyword))
        return true;
  return false;
}, tokenSource, ExecuteHandlers.InCurrentTask, true);  
***

Moreover one can use simple static method:

此外,可以使用简单的静态方法:

List<FileInfo> files = FileSearcher.GetFilesFast(@"C:\Users", "*.xml");

Note that all methods of this library DO NOT throw UnauthorizedAccessException instead standard .NET search methods.

请注意,该库的所有方法都不会抛出 UnauthorizedAccessException 而不是标准的 .NET 搜索方法。

Furthermore fast methods of this library are performed at least in 2 times faster than simple one-thread recursive algorithm if you use multicore processor.

此外,如果您使用多核处理器,则该库的快速方法的执行速度至少比简单的单线程递归算法快 2 倍。

回答by Wazzie

For those of you who want to search for a single file and you know your root directory then I suggest you keep it simple as possible. This approach worked for me.

对于那些想要搜索单个文件并且知道根目录的人,我建议您尽可能保持简单。这种方法对我有用。

    private void btnSearch_Click(object sender, EventArgs e)
    {
        string userinput = txtInput.Text;

        string sourceFolder = @"C:\mytestDir\";
        string searchWord = txtInput.Text + ".pdf";
        string filePresentCK = sourceFolder + searchWord;
        if (File.Exists(filePresentCK))
            {

                pdfViewer1.LoadFromFile(sourceFolder+searchWord);
            }
            else if(! File.Exists(filePresentCK))
            {
                MessageBox.Show("Unable to Find file :" + searchWord);
            }

        txtInput.Clear();

    }// end of btnSearch method