vba 希望Excel中的VBA读取非常大的CSV并创建CSV一小部分的输出文件

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/427488/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-08 09:33:51  来源:igfitidea点击:

Want VBA in excel to read very large CSV and create output file of a small subset of the CSV

excelvbacsvexcel-vba

提问by

I have a csv file of 1.2 million records of text. The alphanumeric fields are wrapped in quotation marks, the date/time or numeric fields are not.

我有一个包含 120 万条文本记录的 csv 文件。字母数字字段用引号括起来,日期/时间或数字字段不是。

For example "Fred","Smith",01/07/1967,2,"7, The High Street","Anytown","Anycounty","LS1 7AA"

例如“Fred”、“Smith”、01/07/1967、2、“7, The High Street”、“Anytown”、“Anycounty”、“LS1 7AA”

What I want do is write some VBA in Excel (more or less the only tool available to me that I am reasonably proficient in the use of) that reads the CSV record by record, performs a check (as it happens on the last field, the post code) and then outputs a small subset of the 1.2m records to a new output file.

我想要做的是在 Excel 中编写一些 VBA(或多或少是我可以合理熟练使用的唯一可用工具),逐条读取 CSV 记录,执行检查(就像在最后一个字段中发生的那样,邮政编码),然后将 1.2m 记录的一小部分输出到新的输出文件。

I understand how to open the two files, read the record, do what I need to do with the data and write it out (I will just output the input record with a prefix denoting an exception type)

我了解如何打开这两个文件,读取记录,对数据做我需要做的事情并将其写出(我只会输出带有表示异常类型的前缀的输入记录)

What I don't know is how to parse the CSV in VBA properly. I can't do a simple text scan and search for commas as the text sometimes has commas in (hence why the text fields are text delimited)

我不知道如何正确解析 VBA 中的 CSV。我无法进行简单的文本扫描和搜索逗号,因为文本有时会包含逗号(因此为什么文本字段是文本分隔的)

Is there a fantastic command that would let me quicky get the data from the nth field in my record?

是否有一个很棒的命令可以让我快速获取记录中第 n 个字段的数据?

What I want is s_work = field(s_input_record,5) where 5 is the field number in my CSV....

我想要的是 s_work = field(s_input_record,5) 其中 5 是我的 CSV 中的字段编号....

Many thanks, C

非常感谢,C

回答by e.James

The following code should do the trick. I don't have Excel in front of me, so I haven't tested it, but the concept is sound.

以下代码应该可以解决问题。我面前没有 Excel,所以我还没有对其进行测试,但这个概念是合理的。

If this ends up being too slow, we can look at ways to improve the efficiency.

如果这最终太慢,我们可以寻找提高效率的方法。

Sub SelectSomeRecords()
    Dim testLine As String

    Open inputFileName For Input As #1
    Open outputFileName For Output As #2

    While Not EOF(1)
        Line Input #1, testLine
        If RecordIsInteresting(testLine) Then
            Print #2, testLine
        End If
    Wend

    Close #1
    Close #2
End Sub

Function RecordIsInteresting(recordLine As String) As Boolean
    Dim lineItems(1 to 8) As String

    GetRecordItems(lineItems(), recordLine)

    ''// do your custom checking here:
    RecordIsInteresting = lineItems(8) = "LS1 7AA"
End Function

Sub GetRecordItems(items() As String, recordLine as String)
    Dim finishString as Boolean
    Dim itemString as String
    Dim itemIndex as Integer
    Dim charIndex as Long
    Dim inQuote as Boolean
    Dim testChar as String

    inQuote = False
    charIndex = 1
    itemIndex = 1
    itemString = ""
    finishString = False

    While charIndex <= Len(recordLine)
        testChar = Mid$(recordLine, charIndex, 1)

        finishString = False

        If inQuote Then
            If testChar = Chr$(34) Then
                inQuote = False
                finishString = True
                charIndex = charIndex + 1 ''// ignore the next comma
            Else
                itemString = itemString + testChar
            End If
        Else
            If testChar = Chr$(34) Then
                inQuote = True
            ElseIf testChar = "," Then
                finishString = True
            Else
                itemString = itemString + testChar
            End If
        End If

        If finishString Then
            items(itemIndex) = itemString
            itemString = ""
            itemIndex = itemIndex + 1
        End If

        charIndex = charIndex + 1
    Wend
End Sub

回答by Fionnuala

How about VBScript, though this would also work in Excel:

VBScript 怎么样,虽然这也适用于 Excel:

Set cn = CreateObject("ADODB.Connection")

'Note HDR=Yes, that is, first row contains field names '
'and FMT delimted, ie CSV '

strCon="Provider=Microsoft.Jet.OLEDB.4.0;Data Source=c:\Docs\;" _
& "Extended Properties=""text;HDR=Yes;FMT=Delimited"";"

cn.open strcon

'You would not need delimiters ('') if last field is numeric: '    
strSQL="SELECT FieldName1, FieldName2 INTO New.csv FROM Old.csv " _
& " WHERE LastFieldName='SomeTextValue'"

'Creates new csv file
cn.Execute strSQL

回答by Hank Gay

This doesn't directly answer your question, but grep(or one of the Windows equivalents) would really shine for this, e.g.,

这并不能直接回答您的问题,但是grep(或 Windows 等价物之一)确实会为此大放异彩,例如,

grep -e <regex_filter> foo.csv > bar.csv

回答by dkretz

I used the following derivative of the code given above to successfully open an arbitrary csv file from VBA in Excel.

我使用上面给出的代码的以下衍生物在 Excel 中从 VBA 成功打开任意 csv 文件。

Option Explicit
Public cn As Connection
Public Sub DoIt()
Dim strcon As String
Dim strsql As String
Dim rs As Recordset

Set cn = CreateObject("ADODB.Connection")

strcon = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=C:\bin\HomePlanet\;" _
& "Extended Properties=""text;HDR=Yes;FMT=Delimited"";"

cn.Open strcon

strsql = "SELECT * FROM astuname.csv "
Set rs = New ADODB.Recordset
rs.Open strsql, cn
DoEvents ' pause here to inspect objects and properties rs.Close
End Sub

Option Explicit
Public cn As Connection
Public Sub DoIt()
Dim strcon As String
Dim strsql As String
Dim rs As Recordset

Set cn = CreateObject("ADODB.Connection")

strcon = "Provider=Microsoft.Jet.OLEDB.4.0;数据源=C:\bin\HomePlanet\;" _
& "扩展属性=""text;HDR=Yes;FMT=Delimited"";"

cn.打开strcon

strsql = "SELECT * FROM astuname.csv "
Set rs = New ADODB.Recordset
rs.Open strsql, cn
DoEvents ' 停在这里检查对象和属性 rs.Close
End Sub

The rs (recordset) has a collection of fields, with a Count property. Each field as a Type property.

rs(记录集)有一个字段集合,带有一个 Count 属性。每个字段作为一个类型属性。

You can reference the fields by sequence number ...

您可以通过序列号引用字段...

Debug.Print rs.Fields(rs.Fields.Count - 1).Type

Debug.Print rs.Fields(rs.Fields.Count - 1).Type

Is this sufficient?

这足够了吗?

If not, post the first several rows of the input file and I'll take it the rest of the way.

如果没有,请发布输入文件的前几行,然后我会继续处理。

回答by barrowc

Look at the Input #statement in the Excel help

查看Input #Excel帮助中的语句

Sample usage would be:

示例用法是:

Input #fnInput, s_Forename, s_Surname, dt_DOB, i_Something, s_Street, s_Town, s_County, s_Postcode

and then use the Write #statement to write matching records out again

然后使用该Write #语句再次写出匹配的记录

The only issue might be that the date format in the output will end up as #1967-07-01# but this format is unambiguous unlike 01/07/1967 which would represent 1st July in the UK and 7th January in the US. If you need to preserve the formatting of the date then write it out as a string:

唯一的问题可能是输出中的日期格式最终会是#1967-07-01#,但这种格式与 01/07/1967 不同,后者在英国代表 7 月 1 日,在美国代表 1 月 7 日。如果您需要保留日期的格式,请将其写为字符串:

s_DOB = Format(dt_DOB, "dd/mm/yyyy")

回答by dkretz

Anything you can do a-row-at-a-time with vba in excel, you can do in access with vba; plus a lot more because it's a database rather than a spreadsheet. Is access unavailable to you?

任何你可以在 excel 中用 vba 一次做一行的事情,你都可以用 vba 来做;加上更多,因为它是一个数据库而不是电子表格。您无法访问吗?

It's a lot easier to deal with logical tables, records, and fields than logical worksheets, rows, and columns.

处理逻辑表、记录和字段比处理逻辑工作表、行和列容易得多。

For input, why does the "/Data/Import External Data/Text/csv" not work? Is the input not truly portable csv?

对于输入,为什么“/Data/Import External Data/Text/csv”不起作用?输入不是真正可移植的csv吗?

回答by Mike Woodhouse

I'd suggest taking a look at the Regular Expression library (you should see it in "Tools...References" as "Microsoft VBScript Regular Expressions 5.5" or something very similar.

我建议查看正则表达式库(您应该在“工具...参考”中将其视为“Microsoft VBScript 正则表达式 5.5”或非常相似的内容。

There are samples of both the Reg Exp and a fairly comprehensive character-by-character at this location: http://www.xbeat.net/vbspeed/c_ParseCSV.php. Note that the Regexp version is waaaay shorter!

在此位置有 Reg Exp 和相当全面的逐字符示例:http: //www.xbeat.net/vbspeed/c_ParseCSV.php。请注意,Regexp 版本要短一些!

Have fun...

玩得开心...