在 VBA 中获取列的所有唯一值的更快方法？

Question

提问by AJJ

Is there a faster way to do this?

有没有更快的方法来做到这一点？

Set data = ws.UsedRange

Set unique = CreateObject("Scripting.Dictionary")

On Error Resume Next
For x = 1 To data.Rows.Count
    unique.Add data(x, some_column_number).Value, 1
Next x
On Error GoTo 0

At this point unique.keysgets what I need, but the loop itself seems to be very slow for files that have tens of thousands of records (whereas this wouldn't be a problem at all in a language like Python or C++ especially).

此时unique.keys得到了我需要的东西，但是对于具有数万条记录的文件，循环本身似乎非常慢（而这在 Python 或 C++ 等语言中根本不是问题，尤其是）。

Answer 1

采纳答案by Florent B.

Loading the values in an array would be much faster:

加载数组中的值会快得多：

Dim data(), dict As Object, r As Long
Set dict = CreateObject("Scripting.Dictionary")

data = ActiveSheet.UsedRange.Columns(1).Value

For r = 1 To UBound(data)
    dict(data(r, some_column_number)) = Empty
Next

data = WorksheetFunction.Transpose(dict.keys())

You should also consider early binding for the Scripting.Dictionary:

您还应该考虑对 Scripting.Dictionary 进行早期绑定：

Dim dict As New Scripting.Dictionary  ' requires `Microsoft Scripting Runtime` '

Note that using a dictionary is way faster than Range.AdvancedFilteron large data sets.

请注意，在大型数据集上使用字典比Range.AdvancedFilter快得多。

As a bonus, here's a procedure similare to Range.RemoveDuplicatesto remove duplicates from a 2D array:

作为奖励，这里有一个类似于 Range.RemoveDuplicates的过程，用于从 2D 数组中删除重复项：

Public Sub RemoveDuplicates(data, ParamArray columns())
    Dim ret(), indexes(), ids(), r As Long, c As Long
    Dim dict As New Scripting.Dictionary  ' requires `Microsoft Scripting Runtime` '

    If VarType(data) And vbArray Then Else Err.Raise 5, , "Argument data is not an array"

    ReDim ids(LBound(columns) To UBound(columns))

    For r = LBound(data) To UBound(data)         ' each row '
        For c = LBound(columns) To UBound(columns)   ' each column '
            ids(c) = data(r, columns(c))                ' build id for the row
        Next
        dict(Join$(ids, ChrW(-1))) = r  ' associate the row index to the id '
    Next

    indexes = dict.Items()
    ReDim ret(LBound(data) To LBound(data) + dict.Count - 1, LBound(data, 2) To UBound(data, 2))

    For c = LBound(ret, 2) To UBound(ret, 2)  ' each column '
        For r = LBound(ret) To UBound(ret)      ' each row / unique id '
            ret(r, c) = data(indexes(r - 1), c)   ' copy the value at index '
        Next
    Next

    data = ret
End Sub

Answer 2

回答by Jeremy Thompson

Use Excel's AdvancedFilter function to do this.

使用 Excel 的 AdvancedFilter 函数执行此操作。

Using Excels inbuilt C++ is the fastest way with smaller datasets, using the dictionary is faster for larger datasets. For example:

对于较小的数据集，使用 Excel 内置的 C++ 是最快的方法，对于较大的数据集使用字典更快。例如：

Copy values in Column A and insert the unique values in column B:

复制 A 列中的值并在 B 列中插入唯一值：

Range("A1:A6").AdvancedFilter Action:=xlFilterCopy, CopyToRange:=Range("B1"), Unique:=True

It workswith multiple columns too:

它也适用于多列：

Range("A1:B4").AdvancedFilter Action:=xlFilterCopy, CopyToRange:=Range("D1:E1"), Unique:=True

Be careful with multiple columns as it doesn't always work as expected. In those cases I resort to removing duplicates which works by choosing a selection of columns to base uniqueness. Ref: MSDN - Find and remove duplicates

小心处理多列，因为它并不总是按预期工作。在这些情况下，我会通过选择一系列列来消除重复项以建立唯一性。参考：MSDN - 查找和删除重复项

Here I remove duplicate columns based on the third column:

在这里，我根据第三列删除重复的列：

Range("A1:C4").RemoveDuplicates Columns:=3, Header:=xlNo

Here I remove duplicate columns based on the second and third column:

在这里，我根据第二列和第三列删除重复的列：

Range("A1:C4").RemoveDuplicates Columns:=Array(2, 3), Header:=xlNo

Answer 3

回答by brettdj

PowerShell is a very powerful and efficient tool. This is cheating a little, but shelling PowerShell via VBA opens up lots of options

PowerShell 是一个非常强大和高效的工具。这有点作弊，但是通过 VBA 对 PowerShell 进行外壳打开了很多选择

The bulk of the code below is simply to save the current sheet as a csvfile. The output is another csvfile with just the unique values

下面的大部分代码只是将当前工作表保存为csv文件。输出是另一个只有唯一值的csv文件

Sub AnotherWay()
Dim strPath As String
Dim strPath2 As String

Application.DisplayAlerts = False
strPath = "C:\Temp\test.csv"
strPath2 = "C:\Temp\testout.csv"
ActiveWorkbook.SaveAs strPath, xlCSV
x = Shell("powershell.exe $csv = import-csv -Path """ & strPath & """ -Header A | Select-Object -Unique A | Export-Csv """ & strPath2 & """ -NoTypeInformation", 0)
Application.DisplayAlerts = True

End Sub

Answer 4

回答by user3598756

Try this

尝试这个

Option Explicit

Sub UniqueValues()
Dim ws As Worksheet
Dim uniqueRng As Range
Dim myCol As Long

myCol = 5 '<== set it as per your needs
Set ws = ThisWorkbook.Worksheets("unique") '<== set it as per your needs

Set uniqueRng = GetUniqueValues(ws, myCol)

End Sub


Function GetUniqueValues(ws As Worksheet, col As Long) As Range
Dim firstRow As Long

With ws
    .Columns(col).RemoveDuplicates Columns:=Array(1), header:=xlNo

    firstRow = 1
    If IsEmpty(.Cells(1, col)) Then firstRow = .Cells(1, col).End(xlDown).row

    Set GetUniqueValues = Range(.Cells(firstRow, col), .Cells(.Rows.Count, col).End(xlUp))
End With

End Function

it should be quite fast and without the drawback NeepNeepNeep told about

它应该很快，而且没有 NeepNeepNeep 提到的缺点

在 VBA 中获取列的所有唯一值的更快方法？

提问by AJJ

采纳答案by Florent B.

回答by Jeremy Thompson

回答by brettdj

回答by user3598756

相关推荐

最近更新

标签

在 VBA 中获取列的所有唯一值的更快方法？

提问by AJJ

采纳答案by Florent B.

回答by Jeremy Thompson

回答by brettdj

回答by user3598756

相关推荐

如何从 VBA 中的文件对话框对象中获取单个文件名（对于 MS Access 2007）？

如何使用 VBA 将 Word 文档作为电子邮件正文发送

在 VBA 中使用类有什么好处？

vba 为什么我看不到 Microsoft Forms 2.0 对象库？

相关推荐

最近更新

标签