Excel VBA:用于数据处理的宏运行时间越长越慢

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/13094664/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-11 18:16:05  来源:igfitidea点击:

Excel VBA: Macro for data processing getting slower and slower the longer it runs

excelvbaoptimizationlarge-data

提问by Steve06

I'm analysing large amounts of historical financial data using the QuantlibXl library in Excel 2010, 32-bit. My typical worksheet contains long columns of empirical data of up to 1 million rows. My macros usually need to to run through each row from the top to the bottom and do some Quantlib-typical financial analysis stuff like revaluing a security, which requires that Quantlib objects be created in every row. The analytical stuff is contained within the cells as formulas.

我正在使用 32 位 Excel 2010 中的 QuantlibXl 库分析大量历史财务数据。我的典型工作表包含长达 100 万行的经验数据的长列。我的宏通常需要从上到下遍历每一行,并执行一些 Quantlib 典型的财务分析工作,例如重新评估证券,这需要在每一行中创建 Quantlib 对象。分析内容作为公式包含在单元格中。

So in the beginning I tried to just select the cells with formulas in the top row and fill them down by dragging the lower right corner to the bottom of the sheet. Already here, the processing time grew exponentially with the number of rows involved.

因此,一开始我尝试只选择顶行中带有公式的单元格,然后通过将右下角拖到工作表底部来填充它们。在这里,处理时间随着所涉及的行数呈指数增长。

So I figured I had to write a macro that processes smaller chunks of rows at one time. The macro would basically take care of filling down the top row only 100 rows at a time. That and a number of optimizations (explained below) certainly improved speed tremendously, but processing time still grew exponentially.

所以我想我必须编写一个宏来一次处理较小的行块。宏基本上会一次只填充顶行 100 行。这和一些优化(下面解释)当然大大提高了速度,但处理时间仍然呈指数增长。

The problem is that as much as I try to optimize my macros, they keep getting slower and slower the longer they run. I keep track of processed rows in the status bar, and for example if 2000 rows are processed per minute (the calculations are pretty involved) when kick-starting the macro, its speed decreases constantly throughout its runtime, for example to only 100 rows per minute after 60,000 rows. At that rhythm, it will never see the end of the sheet. So in fact, at some point it becomes optimal to just abort it and start it off again from where it stopped. I also splitted the files and let them run on different computers simultaneouesly, which is a pain in the ass in terms of managing.

问题是,尽管我试图优化我的宏,但它们运行的​​时间越长,它们就越慢。我在状态栏中跟踪已处理的行,例如,如果在启动宏时每分钟处理 2000 行(计算非常复杂),则其速度在整个运行时不断降低,例如每分钟仅处理 100 行60,000 行后的一分钟。以这种节奏,它永远不会看到纸张的尽头。因此,事实上,在某些时候最好中止它并从它停止的地方重新启动它。我还拆分了文件,让它们同时在不同的计算机上运行,​​这在管理方面很麻烦。

I already implemented tons of optimizations: - screen updating and automatic calculations are turned off. - I only perform calculation on the row being processed at a time. - garbage collecting: Quantlib objects are deleted immediately after they are no longer used. I thought it was them eating all free memory that caused the slow-downs. - I got so far to write the relevant results (cells) to a text file and delete the rows that were no longer needed. Again, the macro was very fast in the beginning and would have run until the end within a couple of hours if it wasn't getting slower again after like 70,000 rows. In fact, I had hoped to see a speed increase during runtime as rows are getting deleted and the sheet shrinks, but it just doesn't happen. So I just keep halting the process ever 60,000 rows and kick-starting it again, but its tiresome.

我已经实施了大量优化: - 屏幕更新和自动计算已关闭。- 我一次只对正在处理的行进行计算。- 垃圾收集:Quantlib 对象在不再使用后立即删除。我认为是他们吃掉了所有空闲内存导致了速度变慢。- 到目前为止,我将相关结果(单元格)写入文本文件并删除不再需要的行。同样,宏在开始时非常快,如果在 70,000 行之后不再变慢,它会在几个小时内运行到结束。事实上,我曾希望在运行时看到速度提高,因为行被删除并且工作表缩小,但它并没有发生。所以我只是不停地停止这个过程 60,000 行并再次启动它,

I'd like to figure out what causes this behaviour of Excel not processing large amounts of data linearly and requiring restarts, and how to avoid it. If somebody ran into similar trouble and found a way around it, I'd be glad to hear about it.

我想弄清楚是什么导致 Excel 的这种行为不能线性地处理大量数据并需要重新启动,以及如何避免它。如果有人遇到类似的麻烦并找到了解决方法,我会很高兴听到这个消息。

EDIT:Every time I halt the process to speed it up again by starting over, I noticed that I have to restart Excel, otherwise it resumes just as slow as before. My current hypothesis is that at some point data isn't cleaned up correctly. If this is the case, your solution would bring me any further. The Quantlib library has a method to look at how many objects still reside in memory called ohRepositoryObjectCount(). I call the ohRepositoryDeleteAllObjects() function after every calculation and they are being effectively deleted as per that other method, but maybe there is still some leakage that remains undetected.

编辑:每次我停止进程以通过重新开始再次加速时,我注意到我必须重新启动 Excel,否则它会像以前一样缓慢恢复。我目前的假设是在某些时候数据没有被正确清理。如果是这种情况,您的解决方案将使我更进一步。Quantlib 库有一个方法来查看有多少对象仍然驻留在内存中,称为 ohRepositoryObjectCount()。我在每次计算后都调用 ohRepositoryDe​​leteAllObjects() 函数,并且按照其他方法将它们有效地删除,但也许仍然存在一些未被检测到的泄漏。

EDIT2:I'm now convinced there is memory leakage as after a long batch the task manager shows 3 or 4 Excel processes consuming together about 1.5 GB of memory. When quitting Excel, it crashes (with a message along the lines of "Excel is not working anymore"), and the processes persist, so I have to kill them manually.

EDIT2:我现在确信存在内存泄漏,因为经过长时间的批处理后,任务管理器显示 3 或 4 个 Excel 进程总共消耗了大约 1.5 GB 的内存。退出 Excel 时,它崩溃了(带有“Excel 不再工作”的消息),并且进程持续存在,因此我必须手动终止它们。

回答by Robert Co

If my assumption is correct, your rows are a listing of all your securities; and are not related to one another; and you don't calculate across them. If that is correct, do the following:

如果我的假设是正确的,那么您的行就是您所有证券的清单;并且彼此没有关系;你不会计算它们。如果正确,请执行以下操作:

  1. On a separate sheet, layout all your data columns (both input and output) to represent one row.
  2. Copy and paste values one row of data from your "source" sheet.
  3. Remove all your calculations from your source sheet and put it in here.
  4. Copy and paste values back to your source sheet.
  1. 在单独的工作表上,布置所有数据列(输入和输出)以表示一行。
  2. 从“源”表中复制并粘贴一行数据中的值。
  3. 从源表中删除所有计算并将其放在此处。
  4. 将值复制并粘贴回源工作表。

Put #2 to #4 into the macro and loop through your data.

将 #2 到 #4 放入宏并循环遍历您的数据。

That's my answer, the following are just commentary. If I were doing it:

这就是我的回答,以下只是评论。如果我这样做:

  1. my "source" data will be in a database. I'm sure there are relations among the securities that I would to explore.
  2. I would transpose the row elements into a column on my calc sheet for easy reading.
  3. I would break out the calculations across multiple columns and sections for easy reading.
  1. 我的“源”数据将在数据库中。我确信我要探索的证券之间存在关系。
  2. 我会将行元素转换为计算表上的一列以便于阅读。
  3. 我会在多个列和部分中分解计算以便于阅读。