vb.net 一种快速读取文本文件中行的更快方法
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/16105586/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
A faster way to read lines in text files quickly
提问by Kevin Flynn
My application is looking at huge text files (upwards to half a million lines) from a proxy server log. The problem is that a normal StreamRead iteration of the logs can take an excessive amount of time to process, so I'm looking for something faster.
我的应用程序正在查看来自代理服务器日志的巨大文本文件(多达 50 万行)。问题是日志的正常 StreamRead 迭代可能需要过多的时间来处理,所以我正在寻找更快的东西。
On the form, the user picks the file they need to parse and enters up to three site filters to check for. The application then opens the file and begins to parse the date stamp and website URL from each line in the file. The average speed is about two lines per second, so for a file with 200,000 lines in it, this process will take about 28 hours to process a file.
在表单上,用户选择他们需要解析的文件并输入最多三个站点过滤器进行检查。然后应用程序打开文件并开始解析文件中每一行的日期戳和网站 URL。平均速度约为每秒两行,因此对于其中包含 200,000 行的文件,此过程大约需要 28 小时来处理一个文件。
I've been reading on the Taskclass, and I'm thinking this would probably be the route to take, but Microsoft doesn't give a very good example, so how can I can accomplish it?
我一直在阅读Task类,我想这可能是要走的路线,但微软没有给出一个很好的例子,那么我该如何完成呢?
回答by cat_minhv0
I think you could use File.ReadLines()when reading large files. According to MSDN :
我认为您可以在读取大文件时使用File.ReadLines()。根据 MSDN :
The ReadLinesand ReadAllLines methods differ as follows: When you use ReadLines, you can start enumerating the collection of strings before the whole collection is returned; when you use ReadAllLines, you must wait for the whole array of strings be returned before you can access the array. Therefore, when you are working with very large files, ReadLinescan be more efficient.
该readlines方法和ReadAllLines方法的区别如下:当您使用readlines方法,你可以返回整个集合之前开始枚举字符串的集合; 使用 ReadAllLines 时,必须等待整个字符串数组返回,然后才能访问该数组。因此,当您处理非常大的文件时,ReadLines会更有效率。
For more detail, see MSDN File.ReadLines()
有关更多详细信息,请参阅MSDN File.ReadLines()
回答by dbasnett
Instead of guessing about why it is slow, is it reading the file, processing the lines, etc. start by measuring how long it takes to read the file line-by-line.
与其猜测它为什么慢,不如从读取文件、处理行等开始,先测量逐行读取文件所需的时间。
Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
Dim stpw As New Stopwatch
Dim path As String = "path to your file here"
Dim sr As New IO.StreamReader(path)
Dim linect As Integer = 0
stpw.Restart()
Do While Not sr.EndOfStream
Dim s As String = sr.ReadLine
linect += 1
Loop
stpw.Stop()
sr.Close()
Debug.WriteLine(stpw.Elapsed.ToString)
Debug.WriteLine(linect)
End Sub
I ran this against a test file I have that is 20MB. It is close to 3,000,000 lines long(the lines are very short). It took about .3 of a second to run.
我对一个 20MB 的测试文件运行了这个。它接近 3,000,000 行(行很短)。运行大约需要 0.3 秒。
After you run this you will know whether the problem is the read or the processing, or both.
运行此程序后,您将知道问题是读取还是处理,或两者兼而有之。
回答by Kevin Flynn
Thanks, dbasnett... the results were: 00:00:00.6991336 172900
谢谢,dbasnett……结果是:00:00:00.6991336 172900
Believe it or not, I found the problem. I had the textbox inside a GroupBox and was using the GroupBox.Text property to update statistics back to the user, using GroupBox.Refresh() to update the line x of y and matches found, etc. so the user had some idea of what was being found.
信不信由你,我发现了问题。我在 GroupBox 中有文本框,并使用 GroupBox.Text 属性将统计信息更新回用户,使用 GroupBox.Refresh() 更新 y 的第 x 行和找到的匹配项等,因此用户对什么有一些了解被发现。
By leaving that information out and putting in a progress bar, the speed of the scans went up exponentially. Using 3 filters, I was able to parse 172900 lines in a matter of 3:19 minutes:
通过忽略该信息并放入进度条,扫描速度呈指数级上升。使用 3 个过滤器,我能够在 3:19 分钟内解析 172900 行:
Scan complete!
Process complete!
Scanned 172900 lines out of 172900 lines.
Percentage (icc): 0.0052% (900 matches)
Percentage (facebook): 0.0057% (988 matches)
Percentage (illinois): 0.0005% (95 matches)
Total Matches: 1983
Elapsed Time: 00:03:19.1088851

