使用 C# 在所有文件中更好地搜索字符串

Question

提问by LCJ

After referring many blogs and articles, I have reached at the following code for searching for a string in all files inside a folder. It is working fine in my tests.

在参考了许多博客和文章后，我找到了以下代码，用于在文件夹内的所有文件中搜索字符串。它在我的测试中运行良好。

QUESTIONS

问题

Is there a faster approach for this (using C#)?
Is there any scenario that will fail with this code?

有没有更快的方法（使用 C#）？
是否有任何情况会因此代码而失败？

Note: I tested with very small files. Also very few number of files.

注意：我用非常小的文件进行了测试。文件数量也很少。

CODE

代码

static void Main()
    {
        string sourceFolder = @"C:\Test";
        string searchWord = ".class1";

        List<string> allFiles = new List<string>();
        AddFileNamesToList(sourceFolder, allFiles);
        foreach (string fileName in allFiles)
        {
            string contents = File.ReadAllText(fileName);
            if (contents.Contains(searchWord))
            {
                Console.WriteLine(fileName);
            }
        }

        Console.WriteLine(" ");
        System.Console.ReadKey();
    }

    public static void AddFileNamesToList(string sourceDir, List<string> allFiles)
    {

            string[] fileEntries = Directory.GetFiles(sourceDir);
            foreach (string fileName in fileEntries)
            {
                allFiles.Add(fileName);
            }

            //Recursion    
            string[] subdirectoryEntries = Directory.GetDirectories(sourceDir);
            foreach (string item in subdirectoryEntries)
            {
                // Avoid "reparse points"
                if ((File.GetAttributes(item) & FileAttributes.ReparsePoint) != FileAttributes.ReparsePoint)
                {
                    AddFileNamesToList(item, allFiles);
                }
            }

    }

REFERENCE

参考

Answer 1

采纳答案by VladL

Instead of File.ReadAllText() better use

而不是 File.ReadAllText() 更好地使用

File.ReadLines(@"C:\file.txt");

It returns IEnumerable(yielded) so you will not have to read the whole file if your string is found before the last line of the text file is reached

它返回IEnumerable（产生），因此如果在到达文本文件的最后一行之前找到您的字符串，您将不必读取整个文件

Answer 2

回答by Brannon

I think your code will fail with an exception if you lack permission to open a file.

我认为如果你缺少permission to open a file.

Compare it with the code here: http://bgrep.codeplex.com/releases/view/36186

将其与此处的代码进行比较：http: //bgrep.codeplex.com/releases/view/36186

That latter code supports

后面的代码支持

regular expression search and
filters for file extensions

正则表达式搜索和
文件扩展名过滤器

-- things you should probably consider.

- 你可能应该考虑的事情。

Answer 3

回答by Jason Meckley

the main problem here is that you are searching all the files in real time for every search. there is also the possibility of file access conflicts if 2+ users are searching at the same time.

这里的主要问题是您正在为每次搜索实时搜索所有文件。如果 2 个以上的用户同时搜索，也有可能发生文件访问冲突。

to dramtically improve performance I would index the files ahead of time, and as they are edited/saved. store the indexed using something like lucene.netand then query the index (again using luence.net) and return the file names to the user. so the user never queries the files directly.

为了显着提高性能，我会提前索引文件，并在编辑/保存它们时。使用类似lucene.net 的东西存储索引，然后查询索引（再次使用luence.net）并将文件名返回给用户。所以用户永远不会直接查询文件。

if you follow the links in this SO Postyou may have a head start on implementing the indexing. I didn't follow the links, but it's worth a look.

如果您按照此SO Post 中的链接进行操作，您可能会在实施索引方面有一个良好的开端。我没有按照链接，但值得一看。

Just a heads up, this will be an intense shift from your current approach and will require

请注意，这将是您当前方法的重大转变，并且需要

a service to monitor/index the files
the UI project

监视/索引文件的服务
用户界面项目

Answer 4

回答by Serj-Tm

Instead of Containsbetter use algorithm Boyer-Moore search.
Fail scenario: file have not read permission.

而不是Contains更好地使用算法 Boyer-Moore 搜索。
失败场景：文件没有读取权限。

Answer 5

回答by Scott Chamberlain

I wrote somthing very similar, a couple of changes I would recommend.

我写了一些非常相似的东西，我会推荐一些更改。

Use Directory.EnumerateDirectoriesinstead of GetDirectories, it returns immediately with a IEnumerable so you don't need to wait for it to finish reading all of the directories before processing.
Use ReadLinesinstead of ReadAllText, this will only load one line in at a time in memory, this will be a big deal if you hit a large file.
If you are using a new enough version of .NET use Parallel.ForEach, this will allow you to search multiple files at once.
You may not be able to open the file, you need to check for read permissions or add to the manifestthat your program requires administrative privileges (you should still check though)

使用Directory.EnumerateDirectories而不是 GetDirectories，它会立即返回一个 IEnumerable，因此您无需在处理之前等待它完成读取所有目录。
使用ReadLines而不是 ReadAllText，这只会在内存中一次加载一行，如果你遇到一个大文件，这将是一个大问题。
如果您使用的是足够新的 .NET 版本，请使用Parallel.ForEach，这将允许您一次搜索多个文件。
您可能无法打开该文件，您需要检查读取权限或将您的程序需要管理权限的清单添加到清单中（您仍然应该检查）

I was creating a binary search tool, here is some snippets of what I wrote to give you a hand

我正在创建一个二进制搜索工具，这是我写的一些片段，以帮助您

private void backgroundWorker1_DoWork(object sender, DoWorkEventArgs e)
{
    Parallel.ForEach(Directory.EnumerateFiles(_folder, _filter, SearchOption.AllDirectories), Search);
}

//_array contains the binary pattern I am searching for.
private void Search(string filePath)
{
    if (Contains(filePath, _array))
    {
        //filePath points at a match.
    }
}

private static bool Contains(string path, byte[] search)
{
    //I am doing ReadAllBytes due to the fact that I am doing a binary search not a text search
    //  There are no "Lines" to seperate out on.
    var file = File.ReadAllBytes(path);
    var result = Parallel.For(0, file.Length - search.Length, (i, loopState) =>
        {
            if (file[i] == search[0])
            {
                byte[] localCache = new byte[search.Length];
                Array.Copy(file, i, localCache, 0, search.Length);
                if (Enumerable.SequenceEqual(localCache, search))
                    loopState.Stop();
            }
        });
    return result.IsCompleted == false;
}

This uses two nested parallel loops. This design is terribly inefficient, and could be greatly improved by using the Booyer-Moore search algorithmbut I could not find a binary implementation and I did not have the time when I wrote it originally to implement it myself.

这使用两个嵌套的并行循环。这种设计效率极低，可以通过使用Booyer-Moore 搜索算法大大改进，但我找不到二进制实现，而且我最初编写它时没有时间自己实现它。

使用 C# 在所有文件中更好地搜索字符串

提问by LCJ

采纳答案by VladL

回答by Brannon

回答by Jason Meckley

回答by Serj-Tm

回答by Scott Chamberlain

相关推荐

最近更新

标签

使用 C# 在所有文件中更好地搜索字符串

提问by LCJ

采纳答案by VladL

回答by Brannon

回答by Jason Meckley

回答by Serj-Tm

回答by Scott Chamberlain

相关推荐

C# 在 JavaScript 中使用 Url.Action

C# 从字典中获取第一个元素

C# 如何在设计时避免 XAML 代码中的“对象引用未设置为对象的实例”异常？

C# 将 NULL 值分配给 SqlParameter 的最佳方法

相关推荐

最近更新

标签