vba 如何优化 vlookup 以获得高搜索次数?(VLOOKUP 的替代品)

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/18656808/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-08 16:34:29  来源:igfitidea点击:

How to optimize vlookup for high search count ? (alternatives to VLOOKUP)

performanceexcel-vbadictionaryvlookupvba

提问by d-stroyer

I am looking for alternatives to vlookup, with improved performance within the context of interest.

我正在寻找 vlookup 的替代方案,在感兴趣的上下文中提高性能。

The context is the following:

上下文如下:

  • I have a data set of {key;data} which is big (~ 100'000 records)
  • I want to perform a lot of VLOOKUP operations on the dataset (typical use is to reorder the whole dataset)
  • My data set has no duplicate keys
  • I am looking only for exact matches (last argument to VLOOKUPis FALSE)
  • 我有一个很大的 {key;data} 数据集(约 100'000 条记录)
  • 我想对数据集进行大量的 VLOOKUP 操作(典型用途是对整个数据集进行重新排序)
  • 我的数据集没有重复的键
  • 我只寻找完全匹配(最后一个参数VLOOKUPFALSE

A schema to explain :

解释的模式:

Reference sheet : ("sheet1")

参考资料:( "sheet1")

        A           B
     1
     2  key1        data1
     3  key2        data2
     4  key3        data3
   ...  ...         ...
 99999  key99998    data99998
100000  key99999    data99999
100001  key100000   data100000
100002

Lookup sheet:

查找表:

        A           B
     1
     2  key51359    =VLOOKUP(A2;sheet1!$A:$B0001;2;FALSE)
     3  key41232    =VLOOKUP(A3;sheet1!$A:$B0001;2;FALSE)
     4  key10102    =VLOOKUP(A3;sheet1!$A:$B0001;2;FALSE)
   ...  ...         ...
 99999  key4153     =VLOOKUP(A99999;sheet1!$A:$B0001;2;FALSE)
100000  key12818    =VLOOKUP(A100000;sheet1!$A:$B0001;2;FALSE)
100001  key35032    =VLOOKUP(A100001;sheet1!$A:$B0001;2;FALSE)
100002

On my Core i7 M 620 @2.67 GHz, this computes in ~10 minutes

在我的 Core i7 M 620 @2.67 GHz 上,计算时间约为 10 分钟

Are there alternatives to VLOOKUP with better performance in this context ?

在这种情况下,是否有性能更好的 VLOOKUP 替代方案?

采纳答案by d-stroyer

I considered the following alternatives:

我考虑了以下替代方案:

  • VLOOKUP array-formula
  • MATCH / INDEX
  • VBA (using a dictionary)
  • VLOOKUP 数组公式
  • 匹配/索引
  • VBA(使用字典)

The compared performance is:

比较性能为:

  • VLOOKUP simple formula : ~10 minutes
  • VLOOKUP array-formula : ~10 minutes (1:1 performance index)
  • MATCH / INDEX : ~2 minutes (5:1 performance index)
  • VBA (using a dictionary) : ~6 seconds (100:1 performance index)
  • VLOOKUP 简单公式:~10 分钟
  • VLOOKUP 数组公式:~10 分钟(1:1 性能指数)
  • 比赛/指数:~2 分钟(5:1 表现指数)
  • VBA(使用字典):~6 秒(100:1 性能指数)

Using the same reference sheet

使用相同的参考表

1) Lookup sheet: (vlookup array formula version)

1)查找表:(vlookup数组公式版)

         A          B
     1
     2   key51359    {=VLOOKUP(A2:A10001;sheet1!$A:$B0001;2;FALSE)}
     3   key41232    formula in B2
     4   key10102    ... extends to
   ...   ...         ... 
 99999   key4153     ... cell B100001
100000   key12818    ... (select whole range, and press
100001   key35032    ... CTRL+SHIFT+ENTER to make it an array formula)
100002

2) Lookup sheet: (match+index version)

2)查找表:(匹配+索引版本)

         A           B                                       C
      1
      2  key51359    =MATCH(A2;sheet1!$A:$A0001;)       =INDEX(sheet1!$B:$B0001;B2)
      3  key41232    =MATCH(A3;sheet1!$A:$A0001;)       =INDEX(sheet1!$B:$B0001;B3)
      4  key10102    =MATCH(A4;sheet1!$A:$A0001;)       =INDEX(sheet1!$B:$B0001;B4)
    ...  ...         ...                                     ...
  99999  key4153     =MATCH(A99999;sheet1!$A:$A0001;)   =INDEX(sheet1!$B:$B0001;B99999)
 100000  key12818    =MATCH(A100000;sheet1!$A:$A0001;)  =INDEX(sheet1!$B:$B0001;B100000)
 100001  key35032    =MATCH(A100001;sheet1!$A:$A0001;)  =INDEX(sheet1!$B:$B0001;B100001)
 100002

3) Lookup sheet: (vbalookup version)

3)查找表:(vbalookup版本)

       A          B
     1
     2  key51359    {=vbalookup(A2:A50001;sheet1!$A:$B0001;2)}
     3  key41232    formula in B2
     4  key10102    ... extends to
   ...  ...         ...
 50000  key91021    ... 
 50001  key42       ... cell B50001
 50002  key21873    {=vbalookup(A50002:A100001;sheet1!$A:$B0001;2)}
 50003  key31415    formula in B50001 extends to
   ...  ...         ...
 99999  key4153     ... cell B100001
100000  key12818    ... (select whole range, and press
100001  key35032    ... CTRL+SHIFT+ENTER to make it an array formula)
100002

NB: For some (external internal) reason, the vbalookup fails to return more than 65536 data at a time. So I had to split the array formula in two.

注意:由于某些(外部内部)原因,vbalookup 无法一次返回超过 65536 个数据。所以我不得不将数组公​​式一分为二。

and the associated VBA code :

和相关的 VBA 代码:

Function vbalookup(lookupRange As Range, refRange As Range, dataCol As Long) As Variant
  Dim dict As New Scripting.Dictionary
  Dim myRow As Range
  Dim I As Long, J As Long
  Dim vResults() As Variant

  ' 1. Build a dictionnary
  For Each myRow In refRange.Columns(1).Cells
    ' Append A : B to dictionnary
    dict.Add myRow.Value, myRow.Offset(0, dataCol - 1).Value
  Next myRow

  ' 2. Use it over all lookup data
  ReDim vResults(1 To lookupRange.Rows.Count, 1 To lookupRange.Columns.Count) As Variant
  For I = 1 To lookupRange.Rows.Count
    For J = 1 To lookupRange.Columns.Count
      If dict.Exists(lookupRange.Cells(I, J).Value) Then
        vResults(I, J) = dict(lookupRange.Cells(I, J).Value)
      End If
    Next J
  Next I

  vbalookup = vResults
End Function

NB: Scripting.Dictionaryrequires a referenc to Microsoft Scripting Runtimewhich must be added manually (Tools->References menu in the Excel VBA window)

注意:Scripting.Dictionary需要一个Microsoft Scripting Runtime必须手动添加的引用(Excel VBA 窗口中的工具->引用菜单)

Conclusion :

结论 :

In this context, VBA using a dictionary is 100x faster than using VLOOKUP and 20x faster than MATCH/INDEX

在这种情况下,使用字典的 VBA 比使用 VLOOKUP 快 100 倍,比 MATCH/INDEX 快 20 倍

回答by kevin9999

You also may want to consider using the “double Vlookup” method (not my idea - seen elsewhere). I tested it on 100,000 lookup values on sheet 2 (randomly sorted) with an identical data set as the one you've described on sheet 1, and timed it at just under 4 seconds. The code is also a bit simpler.

您可能还想考虑使用“双 Vlookup”方法(不是我的想法 - 在别处看到)。我在工作表 2(随机排序)上使用与您在工作表 1 中描述的数据集相同的数据集对它进行了 100,000 个查找值的测试,并将其计时在不到 4 秒的时间内。代码也简单一些。

Sub FastestVlookup()

    With Sheet2.Range("B1:B100000")
        .FormulaR1C1 = _
        "=IF(VLOOKUP(RC1,Sheet1!R1C1:R100000C1,1)=RC1,VLOOKUP(RC1,Sheet1!R1C1:R100000C2,2),""N/A"")"
        .Value = .Value
    End With

End Sub

回答by Alan

Switch to Excel 2013 and use Data Model. There you can define a column with unique ID keys in both tables and bind those two tables with relationship in Pivot Table. Than if absolutely necessary you can use Getpivotdata() to fill the first table. I had a ~250K rows table doing vlookup in the similar ~250K rows table. Stopped Excel calculating it after an hour. With Data Model it took less than 10sec.

切换到 Excel 2013 并使用数据模型。在那里,您可以在两个表中定义一个具有唯一 ID 键的列,并在数据透视表中将这两个表与关系绑定在一起。如果绝对必要,您可以使用 Getpivotdata() 填充第一个表。我有一个 ~250K 行表在类似的 ~250K 行表中执行 vlookup。一个小时后停止 Excel 计算。使用数据模型只需不到 10 秒。

回答by HuBe

Value fix: check for a blank cell when building the dictionary. If the cell is blank, exit for.

值修复:在构建字典时检查空白单元格。如果单元格为空白,则退出。