在 VBA 中逐行读取大文件的超快方法是什么?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1376756/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-08 09:58:33  来源:igfitidea点击:

What is a superfast way to read large files line-by-line in VBA?

vbafile-io

提问by Justin

I believe I have come up with a very efficient way to read very, very large files line-by-line. Please tell me if you know of a better/faster way or see room for improvement. I am trying to get better at coding, so any sort of advice you have would be nice. Hopefully this is something that other people might find useful, too.

我相信我已经提出了一种非常有效的方法来逐行读取非常非常大的文件。如果您知道更好/更快的方法或看到改进的空间,请告诉我。我正在努力提高编码能力,因此您的任何建议都会很好。希望这也是其他人可能会觉得有用的东西。

It appears to be something like 8 times faster than using Line Input from my tests.

它似乎比我的测试中使用 Line Input 快 8 倍。

'This function reads a file into a string.                        '
'I found this in the book Programming Excel with VBA and .NET.    '
Public Function QuickRead(FName As String) As String
    Dim I As Integer
    Dim res As String
    Dim l As Long

    I = FreeFile
    l = FileLen(FName)
    res = Space(l)
    Open FName For Binary Access Read As #I
    Get #I, , res
    Close I
    QuickRead = res
End Function

'This function works like the Line Input statement'
Public Sub QRLineInput( _
    ByRef strFileData As String, _
    ByRef lngFilePosition As Long, _
    ByRef strOutputString, _
    ByRef blnEOF As Boolean _
    )
    On Error GoTo LastLine
    strOutputString = Mid$(strFileData, lngFilePosition, _
        InStr(lngFilePosition, strFileData, vbNewLine) - lngFilePosition)
    lngFilePosition = InStr(lngFilePosition, strFileData, vbNewLine) + 2
    Exit Sub
LastLine:
    blnEOF = True
End Sub

Sub Test()
    Dim strFilePathName As String: strFilePathName = "C:\Fld\File.txt"
    Dim strFile As String
    Dim lngPos As Long
    Dim blnEOF As Boolean
    Dim strFileLine As String

    strFile = QuickRead(strFilePathName) & vbNewLine
    lngPos = 1

    Do Until blnEOF
        Call QRLineInput(strFile, lngPos, strFileLine, blnEOF)
    Loop
End Sub

Thanks for the advice!

感谢您的建议!

回答by Rodrigo

You can use Scripting.FileSystemObject to do that thing. From the Reference:

您可以使用 Scripting.FileSystemObject 来做那件事。从参考

The ReadLine method allows a script to read individual lines in a text file. To use this method, open the text file, and then set up a Do Loop that continues until the AtEndOfStream property is True. (This simply means that you have reached the end of the file.) Within the Do Loop, call the ReadLine method, store the contents of the first line in a variable, and then perform some action. When the script loops around, it will automatically drop down a line and read the second line of the file into the variable. This will continue until each line has been read (or until the script specifically exits the loop).

ReadLine 方法允许脚本读取文本文件中的各个行。若要使用此方法,请打开文本文件,然后设置一个 Do 循环,该循环一直持续到 AtEndOfStream 属性为 True。(这只是意味着您已经到达文件的末尾。)在 Do Loop 中,调用 ReadLine 方法,将第一行的内容存储在一个变量中,然后执行一些操作。当脚本循环时,它会自动下拉一行并将文件的第二行读入变量。这将一直持续到每一行都被读取(或直到脚本专门退出循环)。

And a quick example:

一个简单的例子:

Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objFile = objFSO.OpenTextFile("C:\FSO\ServerList.txt", 1)
Do Until objFile.AtEndOfStream
 strLine = objFile.ReadLine
 MsgBox strLine
Loop
objFile.Close

回答by Argut

My two cents…

我的两分钱……

Not long ago I needed reading large files using VBA and noticed this question. I tested the three approaches to read data from a file to compare its speed and reliability for a wide range of file sizes and line lengths. The approaches are:

不久前,我需要使用 VBA 读取大文件并注意到这个问题。我测试了从文件中读取数据的三种方法,以比较其针对各种文件大小和行长的速度和可靠性。方法是:

  1. Line InputVBA statement
  2. Using the File System Object (FSO)
  3. Using GetVBA statement for the whole file and then parsing the string read as described in posts here
  1. Line InputVBA 语句
  2. 使用文件系统对象 (FSO)
  3. Get对整个文件使用VBA 语句,然后按照此处的帖子中所述解析读取的字符串

Each test case consists of three steps:

每个测试用例包含三个步骤:

  1. Test case setup that writes a text file containing given number of lines of the same given length filled by the known character pattern.
  2. Integrity test. Read each file line and verify its length and contents.
  3. File read speed test. Read each line of the file repeated 10 times.
  1. 测试用例设置写入一个文本文件,其中包含由已知字符模式填充的相同给定长度的给定行数。
  2. 完整性测试。读取每个文件行并验证其长度和内容。
  3. 文件读取速度测试。读取文件的每一行重复 10 次。

As you can notice, Step #3 verifies the true file read speed (as asked in the question) while Step #2 verifies the file read integrity and therefore simulates real conditions when string parsing is needed.

如您所见,第 3 步验证真实的文件读取速度(如问题中所述),而第 2 步验证文件读取完整性,因此在需要字符串解析时模拟真实条件。

The following chart shows the test results for the File read speed test. The file size is 64M bytes for all tests, and the tests differ in line length that varies from 2 bytes (not including CRLF) to 8M bytes.

下图显示了文件读取速度测试的测试结果。所有测试的文件大小为 64M 字节,测试的行长度不同,从 2 字节(不包括 CRLF)到 8M 字节不等。

No idea why it is not displayed any longer :(

不知道为什么它不再显示:(

CONCLUSION:

结论:

  1. All the three methods are reliable for large files with normal and abnormal line lengths (please compare to Graeme Howard's answer)
  2. All the three methods produce almost equivalent file reading speed for normal line lengths
  3. “Superfast way” (Method #3) works fine for extremely long lines while the other two don't.
  4. All this is applicable to different Offices, different PCs, for VBA and VB6
  1. 所有三种方法对于具有正常和异常行长的大文件都是可靠的(请与Graeme Howard 的答案进行比较)
  2. 所有三种方法对正常行长度产生几乎相同的文件读取速度
  3. “超快方式”(方法#3)适用于极长的线路,而其他两种方式则不然。
  4. 所有这些都适用于不同的办公室,不同的 PC,适用于 VBA 和 VB6

回答by Graeme Howard

Line Input works fine for small files. However, when file sizes reach around 90k, Line Input jumps all over the place and reads data in the wrong order from the source file. I tested it with different filesizes:

行输入适用于小文件。但是,当文件大小达到 90k 左右时,Line Input 会到处跳转并以错误的顺序从源文件中读取数据。我用不同的文件大小对其进行了测试:

49k = ok
60k = ok
78k = ok
85k = ok
93k = error
101k = error
127k = error
156k = error

Lesson learned - use Scripting.FileSystemObject

经验教训 - 使用 Scripting.FileSystemObject

回答by Nick Dandoulakis

With that code you load the file in memory (as a big string) and then you read that string line by line.

By using Mid$() and InStr() you actually read the "file" twice but since it's in memory, there is no problem.
I don't know if VB's String has a length limit (probably not) but if the text files are hundreds of megabyte in size it's likely to see a performance drop, due to virtual memory usage.

使用该代码将文件加载到内存中(作为一个大字符串),然后逐行读取该字符串。

通过使用 Mid$() 和 InStr(),您实际上将“文件”读取了两次,但由于它在内存中,因此没有问题。
我不知道 VB 的 String 是否有长度限制(可能没有),但如果文本文件的大小为数百兆字节,由于虚拟内存的使用,性能可能会下降。

回答by Jeremy

I would think , in a large file scenario using a stream would be far more efficient, because memory consumption would be very small.

我认为,在大文件场景中使用流会更有效率,因为内存消耗会非常小。

But your algorithm could alternate between using a stream and loading the entire thing in memory based on the file size. I wouldn't be surprised if one is only better than the other under certain criteria.

但是您的算法可以根据文件大小在使用流和在内存中加载整个内容之间交替。如果在某些标准下一个只比另一个好,我不会感到惊讶。

回答by prakash b bajaj

'you can modify above and read full file in one go and then display each line as shown below

'你可以修改上面的,一口气读完整个文件,然后显示每一行,如下图

Option Explicit

Public Function QuickRead(FName As String) As Variant
    Dim i As Integer
    Dim res As String
    Dim l As Long
    Dim v As Variant

    i = FreeFile
    l = FileLen(FName)
    res = Space(l)
    Open FName For Binary Access Read As #i
    Get #i, , res
    Close i
    'split the file with vbcrlf
    QuickRead = Split(res, vbCrLf)
End Function

Sub Test()
    ' you can replace file for "c:\writename.txt to any file name you desire
    Dim strFilePathName As String: strFilePathName = "C:\writename.txt"
    Dim strFileLine As String
    Dim v As Variant
    Dim i As Long
    v = QuickRead(strFilePathName)
    For i = 0 To UBound(v)
        MsgBox v(i)
    Next
End Sub

回答by RexBarker

My take on it...obviously, you've got to do something with the data you read in. If it involves writing it to the sheet, that'll be deadly slow with a normal For Loop. I came up with the following based upon a rehash of some of the items there, plus some help from the Chip Pearson website.

我的看法......显然,你必须对你读入的数据做一些事情。如果它涉及将它写入工作表,那么对于普通的 For 循环来说,这将是非常缓慢的。我根据对那里一些项目的重新整理提出了以下建议,加上来自 Chip Pearson 网站的一些帮助。

Reading in the text file (assuming you don't know the length of the range it will create, so only the startingCell is given):

读入文本文件(假设您不知道它将创建的范围的长度,因此只给出了起始单元格):

Public Sub ReadInPlainText(startCell As Range, Optional textfilename As Variant)

   If IsMissing(textfilename) Then textfilename = Application.GetOpenFilename("All Files (*.*), *.*", , "Select Text File to Read")
   If textfilename = "" Then Exit Sub

   Dim filelength As Long
   Dim filenumber As Integer
   filenumber = FreeFile
   filelength = filelen(textfilename)
   Dim text As String
   Dim textlines As Variant

   Open textfilename For Binary Access Read As filenumber

   text = Space(filelength)
   Get #filenumber, , text

   'split the file with vbcrlf
   textlines = Split(text, vbCrLf) 

   'output to range
   Dim outputRange As Range
   Set outputRange = startCell
   Set outputRange = outputRange.Resize(UBound(textlines), 1)
   outputRange.Value = Application.Transpose(textlines)

   Close filenumber
 End Sub

Conversely, if you need to write out a range to a text file, this does it quickly in one print statement (note: the file 'Open' type here is in text mode, not binary..unlike the read routine above).

相反,如果您需要向文本文件写出一个范围,这可以在一个打印语句中快速完成(注意:这里的文件“打开”类型是文本模式,而不是二进制......与上面的读取例程不同)。

Public Sub WriteRangeAsPlainText(ExportRange As Range, Optional textfilename As Variant)
   If IsMissing(textfilename) Then textfilename = Application.GetSaveAsFilename(FileFilter:="Text Files (*.txt), *.txt")
   If textfilename = "" Then Exit Sub

   Dim filenumber As Integer
   filenumber = FreeFile
   Open textfilename For Output As filenumber

   Dim textlines() As Variant, outputvar As Variant

   textlines = Application.Transpose(ExportRange.Value)
   outputvar = Join(textlines, vbCrLf)
   Print #filenumber, outputvar
   Close filenumber
End Sub

回答by Dumitru Daniel

Be careful when using Application.Transpose with a huge number of values. If you transpose values to a column, excel will assume you are assuming you transposed them from rows.

使用具有大量值的 Application.Transpose 时要小心。如果您将值转置到一列,excel 会假设您是从行中转置它们。



Max Column Limit < Max Row Limit, and it will only display the first (Max Column Limit) values, and anithing after that will be "N/A"

Max Column Limit < Max Row Limit,它只会显示第一个(Max Column Limit)值,之后的值为“N/A”

回答by Profex

I just wanted to share some of my results...

我只是想分享我的一些结果......

I have text files, which apparently came from a Linux system, so I only have a vbLF/Chr(10)at the end of each line and not vbCR/Chr(13).

我有文本文件,这些文件显然来自 Linux 系统,所以每行末尾只有一个vbLF/Chr(10)而不是vbCR/ Chr(13)

Note 1:

  • This meant that the Line Inputmethod would read in the entire file, instead of just one line at a time.

注 1:

  • 这意味着该Line Input方法将读取整个文件,而不是一次只读取一行。

From my research testing small (152KB) & large (2778LB) files, both on and off the network I found the following:

通过我对网络内外的小型 (152KB) 和大型 (2778LB) 文件的研究,我发现了以下内容:

Open FileName For Input: Line Inputwas the slowest(See Note 1above)

Open FileName For Input: Line Input最慢的(见上面的注释 1

Open FileName For Binary Access Read: Inputwas the fastestfor reading the whole file

Open FileName For Binary Access Read: Input是读取整个文件最快

FSO.OpenTextFile: ReadLinewas fast, but a bit slower then Binary Input

FSO.OpenTextFile: ReadLine,但后来有点慢Binary Input

Note 2:

  • If I just needed to check the file header (first 1-2 lines) to check if I had the proper file/format, then FSO.OpenTextFilewas the fastest, followed very closely by Binary Input.

  • The drawback with the Binary Inputis that you have to know how many characters you want to read.

  • On normal files, Line Inputwould also be a good option as well, but I couldn't test due to Note 1.

笔记2:

  • 如果我只需要检查文件头(前 1-2 行)来检查我是否有正确的文件/格式,那么FSO.OpenTextFile最快的,紧随其后的是Binary Input.

  • 的缺点Binary Input是您必须知道要阅读多少个字符。

  • 在普通文件上,Line Input这也是一个不错的选择,但由于Note 1我无法测试。

?

?

Note 3:

  • Obviously, the files on the network showed the largest difference in read speed. They also showed the greatest benefit from reading the file a second time (although there are certainly memory buffers that come into play here).

注 3:

  • 显然,网络上的文件在读取速度上的差异最大。他们还展示了第二次读取文件的最大好处(尽管这里肯定有内存缓冲区发挥作用)。