vba 如何优化 vlookup 以获得高搜索次数?(VLOOKUP 的替代品)
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/18656808/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to optimize vlookup for high search count ? (alternatives to VLOOKUP)
提问by d-stroyer
I am looking for alternatives to vlookup, with improved performance within the context of interest.
我正在寻找 vlookup 的替代方案,在感兴趣的上下文中提高性能。
The context is the following:
上下文如下:
- I have a data set of {key;data} which is big (~ 100'000 records)
- I want to perform a lot of VLOOKUP operations on the dataset (typical use is to reorder the whole dataset)
- My data set has no duplicate keys
- I am looking only for exact matches (last argument to
VLOOKUP
isFALSE
)
- 我有一个很大的 {key;data} 数据集(约 100'000 条记录)
- 我想对数据集进行大量的 VLOOKUP 操作(典型用途是对整个数据集进行重新排序)
- 我的数据集没有重复的键
- 我只寻找完全匹配(最后一个参数
VLOOKUP
是FALSE
)
A schema to explain :
解释的模式:
Reference sheet : ("sheet1"
)
参考资料:( "sheet1"
)
A B
1
2 key1 data1
3 key2 data2
4 key3 data3
... ... ...
99999 key99998 data99998
100000 key99999 data99999
100001 key100000 data100000
100002
Lookup sheet:
查找表:
A B
1
2 key51359 =VLOOKUP(A2;sheet1!$A:$B0001;2;FALSE)
3 key41232 =VLOOKUP(A3;sheet1!$A:$B0001;2;FALSE)
4 key10102 =VLOOKUP(A3;sheet1!$A:$B0001;2;FALSE)
... ... ...
99999 key4153 =VLOOKUP(A99999;sheet1!$A:$B0001;2;FALSE)
100000 key12818 =VLOOKUP(A100000;sheet1!$A:$B0001;2;FALSE)
100001 key35032 =VLOOKUP(A100001;sheet1!$A:$B0001;2;FALSE)
100002
On my Core i7 M 620 @2.67 GHz, this computes in ~10 minutes
在我的 Core i7 M 620 @2.67 GHz 上,计算时间约为 10 分钟
Are there alternatives to VLOOKUP with better performance in this context ?
在这种情况下,是否有性能更好的 VLOOKUP 替代方案?
采纳答案by d-stroyer
I considered the following alternatives:
我考虑了以下替代方案:
- VLOOKUP array-formula
- MATCH / INDEX
- VBA (using a dictionary)
- VLOOKUP 数组公式
- 匹配/索引
- VBA(使用字典)
The compared performance is:
比较性能为:
- VLOOKUP simple formula : ~10 minutes
- VLOOKUP array-formula : ~10 minutes (1:1 performance index)
- MATCH / INDEX : ~2 minutes (5:1 performance index)
- VBA (using a dictionary) : ~6 seconds (100:1 performance index)
- VLOOKUP 简单公式:~10 分钟
- VLOOKUP 数组公式:~10 分钟(1:1 性能指数)
- 比赛/指数:~2 分钟(5:1 表现指数)
- VBA(使用字典):~6 秒(100:1 性能指数)
Using the same reference sheet
使用相同的参考表
1) Lookup sheet: (vlookup array formula version)
1)查找表:(vlookup数组公式版)
A B
1
2 key51359 {=VLOOKUP(A2:A10001;sheet1!$A:$B0001;2;FALSE)}
3 key41232 formula in B2
4 key10102 ... extends to
... ... ...
99999 key4153 ... cell B100001
100000 key12818 ... (select whole range, and press
100001 key35032 ... CTRL+SHIFT+ENTER to make it an array formula)
100002
2) Lookup sheet: (match+index version)
2)查找表:(匹配+索引版本)
A B C
1
2 key51359 =MATCH(A2;sheet1!$A:$A0001;) =INDEX(sheet1!$B:$B0001;B2)
3 key41232 =MATCH(A3;sheet1!$A:$A0001;) =INDEX(sheet1!$B:$B0001;B3)
4 key10102 =MATCH(A4;sheet1!$A:$A0001;) =INDEX(sheet1!$B:$B0001;B4)
... ... ... ...
99999 key4153 =MATCH(A99999;sheet1!$A:$A0001;) =INDEX(sheet1!$B:$B0001;B99999)
100000 key12818 =MATCH(A100000;sheet1!$A:$A0001;) =INDEX(sheet1!$B:$B0001;B100000)
100001 key35032 =MATCH(A100001;sheet1!$A:$A0001;) =INDEX(sheet1!$B:$B0001;B100001)
100002
3) Lookup sheet: (vbalookup version)
3)查找表:(vbalookup版本)
A B
1
2 key51359 {=vbalookup(A2:A50001;sheet1!$A:$B0001;2)}
3 key41232 formula in B2
4 key10102 ... extends to
... ... ...
50000 key91021 ...
50001 key42 ... cell B50001
50002 key21873 {=vbalookup(A50002:A100001;sheet1!$A:$B0001;2)}
50003 key31415 formula in B50001 extends to
... ... ...
99999 key4153 ... cell B100001
100000 key12818 ... (select whole range, and press
100001 key35032 ... CTRL+SHIFT+ENTER to make it an array formula)
100002
NB: For some (external internal) reason, the vbalookup fails to return more than 65536 data at a time. So I had to split the array formula in two.
注意:由于某些(外部内部)原因,vbalookup 无法一次返回超过 65536 个数据。所以我不得不将数组公式一分为二。
and the associated VBA code :
和相关的 VBA 代码:
Function vbalookup(lookupRange As Range, refRange As Range, dataCol As Long) As Variant
Dim dict As New Scripting.Dictionary
Dim myRow As Range
Dim I As Long, J As Long
Dim vResults() As Variant
' 1. Build a dictionnary
For Each myRow In refRange.Columns(1).Cells
' Append A : B to dictionnary
dict.Add myRow.Value, myRow.Offset(0, dataCol - 1).Value
Next myRow
' 2. Use it over all lookup data
ReDim vResults(1 To lookupRange.Rows.Count, 1 To lookupRange.Columns.Count) As Variant
For I = 1 To lookupRange.Rows.Count
For J = 1 To lookupRange.Columns.Count
If dict.Exists(lookupRange.Cells(I, J).Value) Then
vResults(I, J) = dict(lookupRange.Cells(I, J).Value)
End If
Next J
Next I
vbalookup = vResults
End Function
NB: Scripting.Dictionary
requires a referenc to Microsoft Scripting Runtime
which must be
added manually (Tools->References menu in the Excel VBA window)
注意:Scripting.Dictionary
需要一个Microsoft Scripting Runtime
必须手动添加的引用(Excel VBA 窗口中的工具->引用菜单)
Conclusion :
结论 :
In this context, VBA using a dictionary is 100x faster than using VLOOKUP and 20x faster than MATCH/INDEX
在这种情况下,使用字典的 VBA 比使用 VLOOKUP 快 100 倍,比 MATCH/INDEX 快 20 倍
回答by kevin9999
You also may want to consider using the “double Vlookup” method (not my idea - seen elsewhere). I tested it on 100,000 lookup values on sheet 2 (randomly sorted) with an identical data set as the one you've described on sheet 1, and timed it at just under 4 seconds. The code is also a bit simpler.
您可能还想考虑使用“双 Vlookup”方法(不是我的想法 - 在别处看到)。我在工作表 2(随机排序)上使用与您在工作表 1 中描述的数据集相同的数据集对它进行了 100,000 个查找值的测试,并将其计时在不到 4 秒的时间内。代码也简单一些。
Sub FastestVlookup()
With Sheet2.Range("B1:B100000")
.FormulaR1C1 = _
"=IF(VLOOKUP(RC1,Sheet1!R1C1:R100000C1,1)=RC1,VLOOKUP(RC1,Sheet1!R1C1:R100000C2,2),""N/A"")"
.Value = .Value
End With
End Sub
回答by Alan
Switch to Excel 2013 and use Data Model. There you can define a column with unique ID keys in both tables and bind those two tables with relationship in Pivot Table. Than if absolutely necessary you can use Getpivotdata() to fill the first table. I had a ~250K rows table doing vlookup in the similar ~250K rows table. Stopped Excel calculating it after an hour. With Data Model it took less than 10sec.
切换到 Excel 2013 并使用数据模型。在那里,您可以在两个表中定义一个具有唯一 ID 键的列,并在数据透视表中将这两个表与关系绑定在一起。如果绝对必要,您可以使用 Getpivotdata() 填充第一个表。我有一个 ~250K 行表在类似的 ~250K 行表中执行 vlookup。一个小时后停止 Excel 计算。使用数据模型只需不到 10 秒。
回答by HuBe
Value fix: check for a blank cell when building the dictionary. If the cell is blank, exit for.
值修复:在构建字典时检查空白单元格。如果单元格为空白,则退出。