使用 VBA 快速处理大型文本文件

Question

提问by Fink

I'm having a hardtime speeding up the processing of a very large textfile (~100 Meg or so). I've made caution to be very diligent using the redim preserve calls, and yet the function still takes 5 minutes or so to run. The textfile is basically sub reports which i'm trying to parse out. I only have access to the large file. What is a person to do. Is VBA just that slow? Here is the code, the "Report" object is a class I created. Most of the reports are just a couple hundred lines, so thats why I choose 1000 for the ubound:

我很难加速处理一个非常大的文本文件（~100 Meg 左右）。我已经非常谨慎地使用 redim 保留调用，但该函数仍然需要 5 分钟左右的时间才能运行。文本文件基本上是我试图解析的子报告。我只能访问大文件。一个人做什么。VBA 就那么慢吗？这是代码，“报告”对象是我创建的一个类。大多数报告只有几百行，所以这就是我为 ubound 选择 1000 的原因：

Public Function GetPages(originalFilePath As String) As Collection

Dim myReport                As report
Dim reportPageCollection    As Collection
Dim startLine               As Long
Dim endLine                 As Long
Dim fso                     As FileSystemObject
Dim file                    As textStream
Dim lineStr                 As String
Dim index                   As Long
Dim lines()                 As String

Set fso = New FileSystemObject
Set reportPageCollection = New Collection 'initialize the collection

Set file = fso.OpenTextFile(originalFilePath, ForReading)

ReDim lines(0 To 1000)
lineStr = file.ReadLine 'skip the first line so the loop doesnt add a blank report
lines(0) = lineStr
index = 1

Do Until file.AtEndOfLine 'loop through from the startline to find the end line

    lineStr = file.ReadLine

            If lineStr Like "1JOBNAME:*" Then 'next report, so we want to return an array of the single line

                    'load this page into our report page collection for further processing
                    Set myReport = New report
                    myReport.setDataLines = lines() 'Fill in 'ReportPage' Array

                    reportPageCollection.Add myReport 'add our report to the collection

                    'set up array for new report
                    ReDim lines(0 To 1000)
                    index = 0
                    lines(index) = lineStr
                    index = index + 1
            Else

                    '============================ store into array
                        If index = UBound(lines) Then
                            ReDim Preserve lines(0 To UBound(lines) + 1000)
                            lines(index) = lineStr
                            index = index + 1
                        Else
                            lines(index) = lineStr
                            index = index + 1
                        End If
                    '============================
            End If
Loop

file.Close
Set fso = Nothing
Set GetPages = reportPageCollection

End Function

结束函数

Any Help is appreciated. Thanks!

任何帮助表示赞赏。谢谢！

Answer 1

回答by Mike Woodhouse

I just grabbed a 73-meg, 1.2m line text file from my C:\ drive. It took 6 seconds to read through the whole thing, line by line in Excel VBA (doing nothing but reading). So the speed problem isn't obviously file-IO related.

我刚刚从我的 C:\ 驱动器中抓取了一个 73 兆、1.2m 行的文本文件。在 Excel VBA 中一行一行地通读整件事花了 6 秒钟（除了阅读什么都不做）。所以速度问题显然与文件 IO 无关。

A few observations:

一些观察：

I'm nervous about having a variable named "file" when File is a class within the Scripting Runtime;
Do Until file.AtEndOfLinestops almost immediately: you're at the end of a line as soon as you've read one. I think you want Do Until file.AtEndOfStream
the rest of your code looks OK, although I'd move all the stuff about adding lines to arrays into a method on your reportclass
is the file physically local? Or are you reading from a network drive? That might account for the problem. If so, consider reading the whole thing into a string and splitting it. 100MB isn't really that big. 9 seconds to do that with my 73MB file.
You don't need to create a collection variable: GetPages already wants to be that collection

当 File 是脚本运行时中的一个类时，我很担心有一个名为“file”的变量；
Do Until file.AtEndOfLine几乎立即停止：您一读完一行就到了行尾。我想你想要Do Until file.AtEndOfStream
您的其余代码看起来不错，尽管我会将所有有关向数组添加行的内容移到您的report类中的方法中
该文件在物理上是本地的吗？或者您正在从网络驱动器读取？这可能是问题的原因。如果是这样，请考虑将整个内容读入一个字符串并将其拆分。100MB 并没有那么大。用我的 73MB 文件完成这项工作需要 9 秒。
您不需要创建集合变量：GetPages 已经希望成为该集合

So your code might shrink to something like this:

所以你的代码可能会缩小到这样的：

Public Function GetPages(originalFilePath As String) As Collection

Dim myReport As report

Set GetPages = New Collection 'initialize the collection'

With New FileSystemObject ' no need to store an object'

    With .OpenTextFile(originalFilePath, ForReading)  ' ditto'

        Set myReport = New report
        myReport.AddLine .ReadLine

        Do Until .AtEndOfStream

            lineStr = file.ReadLine

            If lineStr Like "1JOBNAME:*" Then 
                GetPages.Add myReport
                Set myReport = New report
            End If

            myReport.AddLine lineStr ' all the array business happens here - much tidier'

        Loop
    End With ' TextStream goes out of scope & closes'
End With ' FileSystemObject goes out of scope, disappears'

End Function

Is there anything there that helps?

有什么可以帮助的吗？

Answer 2

回答by Oorang

There are a few tweaks you could make, the FSO object is known to be slower than VB's native IO. But I don't see anything really heinous in here. Before we go micro-optimizing let me ask a more basic question... Would these files happen to be on a shared drive or an ftp site? If so consider copying them down to a temp folder before processing them.

您可以进行一些调整，已知 FSO 对象比 VB 的本机 IO 慢。但我没有看到这里有什么真正令人发指的东西。在我们进行微优化之前，让我问一个更基本的问题......这些文件是否恰好位于共享驱动器或 ftp 站点上？如果是这样，请考虑在处理它们之前将它们复制到临时文件夹。

Answer 3

回答by Mark P Neyer

Is VBA just that slow?

VBA 就那么慢吗？

Yes. Try XLW, a C++ wrapper for excel.

是的。试试XLW，一个用于 excel 的 C++ 包装器。

使用 VBA 快速处理大型文本文件

提问by Fink

回答by Mike Woodhouse

回答by Oorang

回答by Mark P Neyer

相关推荐

最近更新

标签

使用 VBA 快速处理大型文本文件

提问by Fink

回答by Mike Woodhouse

回答by Oorang

回答by Mark P Neyer

相关推荐

vba excel vba中打印和打印预览事件的区别

VBA 中的 ShellExecuteEx

访问 VBA - 相对文件引用

vba 检查未绑定控件是否具有值的正确方法

相关推荐

最近更新

标签