vba 希望Excel中的VBA读取非常大的CSV并创建CSV一小部分的输出文件
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/427488/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Want VBA in excel to read very large CSV and create output file of a small subset of the CSV
提问by
I have a csv file of 1.2 million records of text. The alphanumeric fields are wrapped in quotation marks, the date/time or numeric fields are not.
我有一个包含 120 万条文本记录的 csv 文件。字母数字字段用引号括起来,日期/时间或数字字段不是。
For example "Fred","Smith",01/07/1967,2,"7, The High Street","Anytown","Anycounty","LS1 7AA"
例如“Fred”、“Smith”、01/07/1967、2、“7, The High Street”、“Anytown”、“Anycounty”、“LS1 7AA”
What I want do is write some VBA in Excel (more or less the only tool available to me that I am reasonably proficient in the use of) that reads the CSV record by record, performs a check (as it happens on the last field, the post code) and then outputs a small subset of the 1.2m records to a new output file.
我想要做的是在 Excel 中编写一些 VBA(或多或少是我可以合理熟练使用的唯一可用工具),逐条读取 CSV 记录,执行检查(就像在最后一个字段中发生的那样,邮政编码),然后将 1.2m 记录的一小部分输出到新的输出文件。
I understand how to open the two files, read the record, do what I need to do with the data and write it out (I will just output the input record with a prefix denoting an exception type)
我了解如何打开这两个文件,读取记录,对数据做我需要做的事情并将其写出(我只会输出带有表示异常类型的前缀的输入记录)
What I don't know is how to parse the CSV in VBA properly. I can't do a simple text scan and search for commas as the text sometimes has commas in (hence why the text fields are text delimited)
我不知道如何正确解析 VBA 中的 CSV。我无法进行简单的文本扫描和搜索逗号,因为文本有时会包含逗号(因此为什么文本字段是文本分隔的)
Is there a fantastic command that would let me quicky get the data from the nth field in my record?
是否有一个很棒的命令可以让我快速获取记录中第 n 个字段的数据?
What I want is s_work = field(s_input_record,5) where 5 is the field number in my CSV....
我想要的是 s_work = field(s_input_record,5) 其中 5 是我的 CSV 中的字段编号....
Many thanks, C
非常感谢,C
回答by e.James
The following code should do the trick. I don't have Excel in front of me, so I haven't tested it, but the concept is sound.
以下代码应该可以解决问题。我面前没有 Excel,所以我还没有对其进行测试,但这个概念是合理的。
If this ends up being too slow, we can look at ways to improve the efficiency.
如果这最终太慢,我们可以寻找提高效率的方法。
Sub SelectSomeRecords()
Dim testLine As String
Open inputFileName For Input As #1
Open outputFileName For Output As #2
While Not EOF(1)
Line Input #1, testLine
If RecordIsInteresting(testLine) Then
Print #2, testLine
End If
Wend
Close #1
Close #2
End Sub
Function RecordIsInteresting(recordLine As String) As Boolean
Dim lineItems(1 to 8) As String
GetRecordItems(lineItems(), recordLine)
''// do your custom checking here:
RecordIsInteresting = lineItems(8) = "LS1 7AA"
End Function
Sub GetRecordItems(items() As String, recordLine as String)
Dim finishString as Boolean
Dim itemString as String
Dim itemIndex as Integer
Dim charIndex as Long
Dim inQuote as Boolean
Dim testChar as String
inQuote = False
charIndex = 1
itemIndex = 1
itemString = ""
finishString = False
While charIndex <= Len(recordLine)
testChar = Mid$(recordLine, charIndex, 1)
finishString = False
If inQuote Then
If testChar = Chr$(34) Then
inQuote = False
finishString = True
charIndex = charIndex + 1 ''// ignore the next comma
Else
itemString = itemString + testChar
End If
Else
If testChar = Chr$(34) Then
inQuote = True
ElseIf testChar = "," Then
finishString = True
Else
itemString = itemString + testChar
End If
End If
If finishString Then
items(itemIndex) = itemString
itemString = ""
itemIndex = itemIndex + 1
End If
charIndex = charIndex + 1
Wend
End Sub
回答by Fionnuala
How about VBScript, though this would also work in Excel:
VBScript 怎么样,虽然这也适用于 Excel:
Set cn = CreateObject("ADODB.Connection")
'Note HDR=Yes, that is, first row contains field names '
'and FMT delimted, ie CSV '
strCon="Provider=Microsoft.Jet.OLEDB.4.0;Data Source=c:\Docs\;" _
& "Extended Properties=""text;HDR=Yes;FMT=Delimited"";"
cn.open strcon
'You would not need delimiters ('') if last field is numeric: '
strSQL="SELECT FieldName1, FieldName2 INTO New.csv FROM Old.csv " _
& " WHERE LastFieldName='SomeTextValue'"
'Creates new csv file
cn.Execute strSQL
回答by Hank Gay
This doesn't directly answer your question, but grep(or one of the Windows equivalents) would really shine for this, e.g.,
这并不能直接回答您的问题,但是grep(或 Windows 等价物之一)确实会为此大放异彩,例如,
grep -e <regex_filter> foo.csv > bar.csv
回答by dkretz
I used the following derivative of the code given above to successfully open an arbitrary csv file from VBA in Excel.
我使用上面给出的代码的以下衍生物在 Excel 中从 VBA 成功打开任意 csv 文件。
Option Explicit
Public cn As Connection
Public Sub DoIt()
Dim strcon As String
Dim strsql As String
Dim rs As RecordsetSet cn = CreateObject("ADODB.Connection")
strcon = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=C:\bin\HomePlanet\;" _
& "Extended Properties=""text;HDR=Yes;FMT=Delimited"";"cn.Open strcon
strsql = "SELECT * FROM astuname.csv "
Set rs = New ADODB.Recordset
rs.Open strsql, cn
DoEvents ' pause here to inspect objects and properties rs.Close
End Sub
Option Explicit
Public cn As Connection
Public Sub DoIt()
Dim strcon As String
Dim strsql As String
Dim rs As RecordsetSet cn = CreateObject("ADODB.Connection")
strcon = "Provider=Microsoft.Jet.OLEDB.4.0;数据源=C:\bin\HomePlanet\;" _
& "扩展属性=""text;HDR=Yes;FMT=Delimited"";"cn.打开strcon
strsql = "SELECT * FROM astuname.csv "
Set rs = New ADODB.Recordset
rs.Open strsql, cn
DoEvents ' 停在这里检查对象和属性 rs.Close
End Sub
The rs (recordset) has a collection of fields, with a Count property. Each field as a Type property.
rs(记录集)有一个字段集合,带有一个 Count 属性。每个字段作为一个类型属性。
You can reference the fields by sequence number ...
您可以通过序列号引用字段...
Debug.Print rs.Fields(rs.Fields.Count - 1).Type
Debug.Print rs.Fields(rs.Fields.Count - 1).Type
Is this sufficient?
这足够了吗?
If not, post the first several rows of the input file and I'll take it the rest of the way.
如果没有,请发布输入文件的前几行,然后我会继续处理。
回答by barrowc
Look at the Input #statement in the Excel help
查看Input #Excel帮助中的语句
Sample usage would be:
示例用法是:
Input #fnInput, s_Forename, s_Surname, dt_DOB, i_Something, s_Street, s_Town, s_County, s_Postcode
and then use the Write #statement to write matching records out again
然后使用该Write #语句再次写出匹配的记录
The only issue might be that the date format in the output will end up as #1967-07-01# but this format is unambiguous unlike 01/07/1967 which would represent 1st July in the UK and 7th January in the US. If you need to preserve the formatting of the date then write it out as a string:
唯一的问题可能是输出中的日期格式最终会是#1967-07-01#,但这种格式与 01/07/1967 不同,后者在英国代表 7 月 1 日,在美国代表 1 月 7 日。如果您需要保留日期的格式,请将其写为字符串:
s_DOB = Format(dt_DOB, "dd/mm/yyyy")
回答by dkretz
Anything you can do a-row-at-a-time with vba in excel, you can do in access with vba; plus a lot more because it's a database rather than a spreadsheet. Is access unavailable to you?
任何你可以在 excel 中用 vba 一次做一行的事情,你都可以用 vba 来做;加上更多,因为它是一个数据库而不是电子表格。您无法访问吗?
It's a lot easier to deal with logical tables, records, and fields than logical worksheets, rows, and columns.
处理逻辑表、记录和字段比处理逻辑工作表、行和列容易得多。
For input, why does the "/Data/Import External Data/Text/csv" not work? Is the input not truly portable csv?
对于输入,为什么“/Data/Import External Data/Text/csv”不起作用?输入不是真正可移植的csv吗?
回答by Mike Woodhouse
I'd suggest taking a look at the Regular Expression library (you should see it in "Tools...References" as "Microsoft VBScript Regular Expressions 5.5" or something very similar.
我建议查看正则表达式库(您应该在“工具...参考”中将其视为“Microsoft VBScript 正则表达式 5.5”或非常相似的内容。
There are samples of both the Reg Exp and a fairly comprehensive character-by-character at this location: http://www.xbeat.net/vbspeed/c_ParseCSV.php. Note that the Regexp version is waaaay shorter!
在此位置有 Reg Exp 和相当全面的逐字符示例:http: //www.xbeat.net/vbspeed/c_ParseCSV.php。请注意,Regexp 版本要短一些!
Have fun...
玩得开心...

