VBA Excel 大数据操作需要永远

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/6134002/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-11 13:19:38  来源:igfitidea点击:

VBA Excel large data manipulation taking forever

arraysvbamemoryexcel-vbaexcel

提问by AjayR

I have two excel files.

我有两个excel文件。

First excel file contains the Person Name and Total Days Present column Ex.

第一个excel文件包含人名和总天数列Ex。

PersonName       TotalDays
xyz               
abcd             

Another excel file contains Person Name, Date and Status (present/absent).

另一个 excel 文件包含人员姓名、日期和状态(存在/不存在)。

PersonName      Date      Status
xyz           1/1/2011    Present
xyz           1/1/2011    Present

I need to group the similar dates status as one, and count them to update in first excel file.

我需要将相似的日期状态分组为一个,并计算它们以在第一个 excel 文件中更新。

I have around 100 rows in first file where as 20,000 rows in second file from where I need to check. So to make it faster, I loaded all the rows from second file in an Array and reading them to calculate with each entry which works correctly.

我在第一个文件中有大约 100 行,而在我需要检查的第二个文件中有 20,000 行。因此,为了使其更快,我将第二个文件中的所有行加载到一个数组中并读取它们以计算每个正常工作的条目。

The issue is, it take large memory so in Windows so many application automatically opens up and system almost hangs.

问题是,它需要大内存,所以在 Windows 中,很多应用程序自动打开,系统几乎挂起。

Is there any alternate to implement this without memory issue and fast processing. I came across Scripting.Dictionary but not sure whether it will take less memory.

是否有任何替代方案可以在没有内存问题和快速处理的情况下实现这一点。我遇到了 Scripting.Dictionary 但不确定它是否会占用更少的内存。

EDITI tried using redim preserve and static array with 20,000 size, in both case same problem happens.

编辑我尝试使用大小为 20,000 的 redim 保留和静态数组,在这两种情况下都会发生相同的问题。

EDIT

编辑

lblStatus.Caption = "Loading to memory"
 Dim ArrAuditData() As AData
 Dim TotalLookUpCount As Integer
 For J = 1 To 50000

 If lookUpRange.Cells(J, cmbChoice.ListIndex) = "Fail" Then
  ReDim Preserve ArrAuditData(J) As AData
    ArrAuditData(TotalLookUpCount).AuditType = lookUpRange.Cells(J, cmdAudit2.ListIndex)
    ArrAuditData(TotalLookUpCount).TransTime = lookUpRange.Cells(J, cmbChoice.ListIndex - 1)
    ArrAuditData(TotalLookUpCount).AuditValue = lookUpRange.Cells(J, cmbChoice.ListIndex)
    ArrAuditData(TotalLookUpCount).Slno = lookUpRange.Cells(J, 0)

    TotalLookUpCount = TotalLookUpCount + 1
ElseIf lookUpRange.Cells(J, cmbChoice.ListIndex) = "" And J > 4 Then Exit For

    End If
    DoEvents
  Next

回答by Jean-Fran?ois Corbett

An array of 20,000 elements containing 4 Variants each will take up less than 2 MB of RAM. I don't think memory has anything to do with your problem -- unless you happen to be using an old computer with 2 MB of RAM or something like that.

包含 4 个变体的 20,000 个元素的数组每个将占用不到 2 MB 的 RAM。我认为内存与您的问题没有任何关系——除非您碰巧使用的是带有 2 MB RAM 或类似内容的旧计算机。

A more likely reason why your code is so heavy is that you are looping through cells. There is significant overhead to each communication between VBA and Excel sheet data, and this adds up when you refer to many cells one at a time. In your case, your loop does up to 200,000 separate cell references.

您的代码如此繁重的一个更可能的原因是您正在遍历单元格。VBA 和 Excel 工作表数据之间的每次通信都会产生大量开销,当您一次引用多个单元格时,这种开销会增加。在您的情况下,您的循环最多可进行 200,000 个单独的单元格引用。

Instead, you should load all your data at once into a Variantarray, and then loop through that array, as shown below. This is significantly faster (even though this uses morememory, not less; but again, I don't think memory is your issue).

相反,您应该一次将所有数据加载到一个Variant数组中,然后循环遍历该数组,如下所示。这明显更快(即使这使用了更多的内存,而不是更少;但同样,我不认为内存是你的问题)。

lblStatus.Caption = "Loading to memory"
Dim ArrAuditData() As AData
Dim varTemp As Variant
Dim TotalLookUpCount As Integer

' Load everything into a Variant array. 
varTemp = lookUpRange

ReDim ArrAuditData(1 To UBound(varTemp, 1)) As AData

For J = 1 To UBound(varTemp, 1)

    If varTemp(J, cmbChoice.ListIndex) = "Fail" Then

        ArrAuditData(TotalLookUpCount).AuditType = varTemp(J, cmdAudit2.ListIndex)
        ArrAuditData(TotalLookUpCount).TransTime = varTemp(J, cmbChoice.ListIndex - 1)
        ArrAuditData(TotalLookUpCount).AuditValue = varTemp(J, cmbChoice.ListIndex)
        ArrAuditData(TotalLookUpCount).Slno = varTemp(J, 0)
        TotalLookUpCount = TotalLookUpCount + 1

    ElseIf varTemp(J, cmbChoice.ListIndex) = "" And J > 4 Then
        Exit For

    End If

    DoEvents
Next

ReDim Preserve ArrAuditData(TotalLookUpCount) As AData

For further reading, have a look at this old but still relevant article: http://www.avdf.com/apr98/art_ot003.html

如需进一步阅读,请查看这篇旧但仍然相关的文章:http: //www.avdf.com/apr98/art_ot003.html

If you still think RAM is the issue, then please show us the ADatatype declaration.

如果您仍然认为 RAM 是问题,那么请向我们展示AData类型声明。

EDIT: Also, never ReDim Preserveinside a loop like that! ReDim Preserveis a very expensive operation and rarely needs to be done more than once on any given array. Doing it 20,000 times will slow down your code. Here I take it out of the loop, and just use it once at the end to trim off the unused elements. (Notice how I initially ReDim'ed the array to fit the largest conceivable number of elements.)

编辑:另外,永远不要ReDim Preserve在这样的循环中!ReDim Preserve是一项非常昂贵的操作,很少需要对任何给定数组执行多次。这样做 20,000 次会减慢您的代码速度。在这里,我将它从循环中取出,并在最后使用它一次以修剪掉未使用的元素。(注意我最初是如何ReDim调整数组以适应最大数量的元素的。)

回答by chris neilsen

I would suggest a different approach.

我会建议一种不同的方法。

If I interpret the question correctly:

如果我正确解释了这个问题:

  • you want to get a count of days each person is "Present" or "Absent"
  • the first file (call it file1) contains one row per person (about 100 people)
  • the second file (call it file2) contains one row per person per day (100 people and 200 days = 20000 rows)
  • the desired output is a two extra columns in file 1, being a count of "Present" and a count of "Absent"
  • 您想计算每个人“在场”或“不在场”的天数
  • 第一个文件(称为 file1)包含每人一行(约 100 人)
  • 第二个文件(称为 file2)包含每人每天一行(100 人和 200 天 = 20000 行)
  • 所需的输出是文件 1 中的两个额外列,即“存在”的计数和“缺席”的计数

The approach I would use is to use the COUNTIF (or if you hvae Excel 2007 or later COUNTIFS)

我会使用的方法是使用 COUNTIF(或者如果您使用 Excel 2007 或更高版本的 COUNTIFS)

Assume

认为

  • file1 contains a Table on Sheet1 called StatusReport, columns A = Name, B = Present, C = Absent
  • one row for each unique name
  • file2 contains a Table on Sheet1 called StatusData, columns A = Name, B = Date, C = Status
  • one row for each name for each date
  • file1 在 Sheet1 上包含一个名为 StatusReport 的表,A 列 = 名称,B = 存在,C = 缺席
  • 每个唯一名称占一行
  • file2 在 Sheet1 上包含一个名为 StatusData 的表,列 A = 名称、B = 日期、C = 状态
  • 每个日期的每个名称占一行

Solution for Excel 2007 or 2010

Excel 2007 或 2010 的解决方案

  • file1 cell B2
    =COUNTIFS(file2.xlsx!StatusData[Name],[Name],file2.xlsx!StatusData[Status],StatusReport[[#Headers],[Present]])
  • file1 cell C2
    =COUNTIFS(file2.xlsx!StatusData[Name],[Name],file2.xlsx!StatusData[Status],StatusReport[[#Headers],[Absent]])
  • 文件 1 单元格 B2
    =COUNTIFS(file2.xlsx!StatusData[Name],[Name],file2.xlsx!StatusData[Status],StatusReport[[#Headers],[Present]])
  • 文件 1 单元格 C2
    =COUNTIFS(file2.xlsx!StatusData[Name],[Name],file2.xlsx!StatusData[Status],StatusReport[[#Headers],[Absent]])

Solution for Excel 2003

Excel 2003 解决方案

  • Add an extra column D to file2 StatusData table (call it Code)
    =Sheet1!$A2&"_"&Sheet1!$C2

  • file1 cell B2
    =COUNTIF([file2.xls]Sheet1!$D:$D,Sheet2!$A2&"_"&Sheet2!$B$1)

  • file1 cell C2
    =COUNTIF([file2.xls]Sheet1!$D:$D,Sheet2!$A2&"_"&Sheet2!$C$1)
  • 向 file2 StatusData 表添加一个额外的列 D(称之为代码)
    =Sheet1!$A2&"_"&Sheet1!$C2

  • 文件 1 单元格 B2
    =COUNTIF([file2.xls]Sheet1!$D:$D,Sheet2!$A2&"_"&Sheet2!$B$1)

  • 文件 1 单元格 C2
    =COUNTIF([file2.xls]Sheet1!$D:$D,Sheet2!$A2&"_"&Sheet2!$C$1)

Note: while these formula give the same result the COUNTIFS + Table references version in 2010 if so much faster its not funny (my test on about 300,000 rows updates in a few seconds).

注意:虽然这些公式在 2010 年的 COUNTIFS + Table 引用版本中给出了相同的结果,但速度要快得多,这并不有趣(我对大约 300,000 行的测试在几秒钟内更新)。