Vba 将休 csv 文件的子部分导入 excel 2010

Question

提问by Fantrf

I have a csv file that has approx 600 fields and approx 100k of rows, i would like to import only select fields and only certian rows where a select set of fields match a certain set of criteria into an existing excel worksheet tab

我有一个 csv 文件，它有大约 600 个字段和大约 100k 行，我想只导入选择字段和仅某些行，其中一组选择的字段与一组特定的条件匹配到现有的 excel 工作表选项卡

I attempted to use ms query within excel but it stops at 255 columns, i can import the whole file in excel 2010 (250m) but it is a memory hog and by the time i remove the unneeded fields and rows it locks up my computer.

我试图在 excel 中使用 ms 查询，但它在 255 列处停止，我可以在 excel 2010 (250m) 中导入整个文件，但它是一个内存猪，当我删除不需要的字段和行时，它锁定了我的计算机。

I would like to kick the import process off with an excel vba macro. I have all the front end code of file selection, etc.... But need some assistance in the text read query convert to excel area of vba

我想用 excel vba 宏启动导入过程。我有文件选择的所有前端代码，等等......但在文本读取查询转换为vba的excel区域需要一些帮助

Any assitance would be greatly appreciated

任何帮助将不胜感激

Thanks

谢谢

Tom

汤姆

Answer 1

回答by Brandon R. Gates

For that many records you would be better off importing the .csv into Microsoft Access, indexing some fields, writing a query that contains only what you want, and then exporting to Excel from the query.

对于这么多记录，最好将 .csv 导入 Microsoft Access，为某些字段编制索引，编写仅包含所需内容的查询，然后从查询导出到 Excel。

If you really need an Excel-only solution, do the following:

如果您确实需要仅限 Excel 的解决方案，请执行以下操作：

Open up the VBA editor. Navigate to Tools -> References. Select the most recent ActiveX Data Objects Library. (ADO for short). On my XP machine running Excel 2003, it's version 2.8.
Create a module if you don't have one already. Or create one anyway to contain the code at the bottom of this post.
In any blank worksheet paste the following values starting at cell A1:

打开 VBA 编辑器。导航到工具 -> 参考。选择最新的 ActiveX 数据对象库。（简称ADO）。在我运行 Excel 2003 的 XP 机器上，它是 2.8 版。
如果您还没有模块，请创建一个模块。或者无论如何创建一个来包含这篇文章底部的代码。
在任何空白工作表中，从单元格 A1 开始粘贴以下值：

SELECT Field1, Field2 FROM C:\Path\To\file.csv WHERE Field1 = 'foo' ORDER BY Field2

(Formatting issues here. selectfrom, etc should each be in their own row in col A for reference. The other stuff are the important bits and should go in column B.)

（这里的格式问题。 selectfrom等都应该在列 A 中各自的行中以供参考。其他东西是重要的部分，应该放在 B 列中。）

Amend the input fields as appropriate for your filename and query requirements, then run thegetCsv()subroutine. It will put the results in a QueryTable object starting at cell C6.

根据文件名和查询要求修改输入字段，然后运行getCsv()子例程。它将结果放在从单元格 C6 开始的 QueryTable 对象中。

I personally hate QueryTables but the .CopyFromRecordsetmethod I prefer to use with ADO doesn't give you field names. I left the code for that method in, commented out, so you can investigate that way. If you use it, you can get rid of the call to deleteQueryTables()because it's a really ugly hack, it deletes whole columns which you may not like, etc.

我个人讨厌 QueryTables，但.CopyFromRecordset我更喜欢与 ADO 一起使用的方法不会为您提供字段名称。我留下了那个方法的代码，注释掉了，这样你就可以调查了。如果你使用它，你可以摆脱对的调用，deleteQueryTables()因为它是一个非常丑陋的黑客，它会删除你可能不喜欢的整个列，等等。

Happy coding.

快乐编码。

Option Explicit
Function ExtractFileName(filespec) As String
'   Returns a filename from a filespec
    Dim x As Variant
    x = Split(filespec, Application.PathSeparator)
    ExtractFileName = x(UBound(x))
End Function
Function ExtractPathName(filespec) As String
'   Returns the path from a filespec
    Dim x As Variant
    x = Split(filespec, Application.PathSeparator)
    ReDim Preserve x(0 To UBound(x) - 1)
    ExtractPathName = Join(x, Application.PathSeparator) & Application.PathSeparator
End Function
Sub getCsv()
    Dim cnCsv As New ADODB.Connection
    Dim rsCsv As New ADODB.Recordset
    Dim strFileName As String
    Dim strSelect As String
    Dim strWhere As String
    Dim strOrderBy As String
    Dim strSql As String
    Dim qtData As QueryTable

    strSelect = ActiveSheet.Range("B1").Value
    strFileName = ActiveSheet.Range("B2").Value
    strWhere = ActiveSheet.Range("B3").Value
    strOrderBy = ActiveSheet.Range("B4").Value

    strSql = "SELECT " & strSelect
    strSql = strSql & vbCrLf & "FROM " & ExtractFileName(strFileName)
    If strWhere <> "" Then strSql = strSql & vbCrLf & "WHERE " & strWhere
    If strOrderBy <> "" Then strSql = strSql & vbCrLf & "ORDER BY " & strOrderBy

    With cnCsv
        .Provider = "Microsoft.Jet.OLEDB.4.0"
        .ConnectionString = "Data Source=" & ExtractPathName(strFileName) & ";" & _
            "Extended Properties=""text;HDR=yes;FMT=Delimited(,)"";Persist Security Info=False"
        .Open
    End With

    rsCsv.Open strSql, cnCsv, adOpenForwardOnly, adLockReadOnly, adCmdText
    'ActiveSheet.Range("C6").CopyFromRecordset rsCsv

    Call deleteQueryTables
    Set qtData = ActiveSheet.QueryTables.Add(rsCsv, ActiveSheet.Range("C6"))
    qtData.Refresh

    rsCsv.Close
    Set rsCsv = Nothing
    cnCsv.Close
    Set cnCsv = Nothing
End Sub
Sub deleteQueryTables()
    On Error Resume Next
    With Application
        .ScreenUpdating = False
        .Calculation = xlCalculationManual
    End With

    Dim qt As QueryTable
    Dim qtName As String
    Dim nName As Name
    For Each qt In ActiveSheet.QueryTables
        qtName = qt.Name
        qt.Delete
        For Each nName In Names
            If InStr(1, nName.Name, qtName) > 0 Then
                Range(nName.Name).EntireColumn.Delete
                nName.Delete
            End If
        Next nName
    Next qt

    With Application
        .ScreenUpdating = True
        .Calculation = xlCalculationAutomatic
    End With
End Sub

Answer 2

回答by Graham Anderson

You can parse your input file extracting the lines that conform to your criteria. The following code uses the split function on each line of the CSV file to separate the fields and then checks to see if it matches the required criteria. If all the criteria match then selected fields are saved in a new CSV file then you can just open the smaller file. You will need to set the microsoft scripting runtime reference in the VBA editor for this to work.

您可以解析您的输入文件，提取符合您标准的行。下面的代码在 CSV 文件的每一行上使用 split 函数来分隔字段，然后检查它是否符合所需的条件。如果所有条件都匹配，则所选字段将保存在新的 CSV 文件中，然后您只需打开较小的文件即可。您需要在 VBA 编辑器中设置 microsoft 脚本运行时引用才能使其工作。

This method should use little memory as it processes 1 line at a time, I tested it on data of 600 fields and 100000 lines and it took about 45 seconds to process the file with no noticable increase in RAM usage in windows task manager. It is CPU intensive and the time taken would increase as the complexity data, conditions and the number of fields copied increases.

这种方法应该使用很少的内存，因为它一次处理 1 行，我在 600 个字段和 100000 行的数据上对其进行了测试，处理文件大约需要 45 秒，而 Windows 任务管理器中的 RAM 使用率没有明显增加。它是 CPU 密集型的，并且随着复杂性数据、条件和复制的字段数量的增加，所花费的时间也会增加。

If you prefer to write directly to an existing sheet this can be easily acheived, but you would have to rememove any old data there first.

如果您更喜欢直接写入现有工作表，这很容易实现，但您必须先删除那里的所有旧数据。

Sub Extract()

Dim fileHandleInput As Scripting.TextStream
Dim fileHandleExtract As Scripting.TextStream
Dim fsoObject As Scripting.FileSystemObject
Dim sPath As String
Dim sFilenameExtract As String
Dim sFilenameInput As String
Dim myVariant As Variant
Dim bParse As Boolean   'To check if the line should be written

sFilenameExtract = "Exctract1.CSV"
sFilenameInput = "Input.CSV"

Set fsoObject = New FileSystemObject

sPath = ThisWorkbook.Path & "\"

'Check if this works ie overwrites existing file
If fsoObject.FileExists(sPath & sFilenameExtract) Then
    Set fileHandleExtract = fsoObject.OpenTextFile(sPath & sFilenameExtract, ForWriting)
Else
    Set fileHandleExtract = fsoObject.CreateTextFile((sPath & sFilenameExtract), True)
End If

Set fileHandleInput = fsoObject.OpenTextFile(sPath & sFilenameInput, ForReading)


'extracting headers for selected fields in this case the 1st, 2nd and 124th fields
myVariant = Split(fileHandleInput.ReadLine, ",")
fileHandleExtract.WriteLine (myVariant(0) & "," & _
                             myVariant(1) & "," & _
                             myVariant(123))

'Parse each line (row) of the inputfile
Do While Not fileHandleInput.AtEndOfStream
    myVariant = Split(fileHandleInput.ReadLine, ",")

    'Set bParse initially to true
    bParse = True

    'Check if the first element is greater than 123
    If Not myVariant(0) > 123 Then bParse = False

    'Check if second element is one of allowed values
    'Trim used to remove pesky leading or lagging values when checking
    Select Case Trim(myVariant(1))

    Case "Red", "Yellow", "Green", "Blue", "Black"
    'Do nothing as value found

    Case Else
        bParse = False  'As wasn't a value in the condition
    End Select

    'If the conditions were met by the line then write specific fields to extract file
    If bParse Then
        fileHandleExtract.WriteLine (myVariant(0) & "," & _
                                     myVariant(1) & "," & _
                                     myVariant(123))
    End If

Loop

'close files and cleanup
fileHandleExtract.Close
fileHandleInput.Close

Set fileHandleExtract = Nothing
Set fileHandleInput = Nothing
Set fsoObject = Nothing

End Sub

Vba 将休 csv 文件的子部分导入 excel 2010

提问by Fantrf

回答by Brandon R. Gates

回答by Graham Anderson

相关推荐

最近更新

标签

Vba 将休 csv 文件的子部分导入 excel 2010

提问by Fantrf

回答by Brandon R. Gates

回答by Graham Anderson

相关推荐

VBA 中的关键字“New”有什么作用？

vba 在一张工作表中运行多个“私有子工作表_更改（ByVal Target As Range）”

vba excel：检查基于 3 列的重复行并保留一行

vba 对象'_Global'的方法'Cells'失败VBA

相关推荐

最近更新

标签