解析 CSV,忽略 VBA 中字符串文字中的逗号?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/6780765/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Parse CSV, ignoring commas inside string literals in VBA?
提问by Tom
I have a VBA application that runs every day. It checks a folder where CSVs are downloaded automatically, and adds their contents to a database. When parsing them, I realized that certain values had commas as a part of their name. These values were contained in string literals.
我有一个每天运行的 VBA 应用程序。它检查自动下载 CSV 的文件夹,并将其内容添加到数据库中。在解析它们时,我意识到某些值的名称中包含逗号。这些值包含在字符串文字中。
So I'm trying to figure out how to parse this CSV and ignore commas that are contained in string literals. For example...
所以我试图弄清楚如何解析这个 CSV 并忽略包含在字符串文字中的逗号。例如...
1,2,3,"This should,be one part",5,6,7 Should return
1
2
3
"This should,be one part"
5
6
7
I have been using VBA's split() function, because I don't wanna reinvent the wheel, but if I have to I guess I'll do something else.
我一直在使用 VBA 的 split() 函数,因为我不想重新发明轮子,但如果必须的话,我想我会做其他事情。
Any suggestions would be appreciated.
任何建议,将不胜感激。
采纳答案by MRAB
A simple regex for parsing a CSV line, assuming no quotes inside quoted fields, is:
一个用于解析 CSV 行的简单正则表达式,假设引用字段内没有引号,是:
"[^"]*"|[^,]*
Each match will return a field.
每个匹配将返回一个字段。
回答by kb_sou
The first way to solve this problem is to look at the structure of the line from the csv file (int,int,"String literal, will have at most one comma", etc). A naive solution would be (Assuming that the line don't have any semicolons)
解决此问题的第一种方法是查看 csv 文件中行的结构(int、int、“字符串文字,最多会有一个逗号”等)。一个天真的解决方案是(假设该行没有任何分号)
Function splitLine1(line As String) As String()
Dim temp() As String
'Splits the line in three. The string delimited by " will be at temp(1)
temp = Split(line, Chr(34)) 'chr(34) = "
'Replaces the commas in the numeric fields by semicolons
temp(0) = Replace(temp(0), ",", ";")
temp(2) = Replace(temp(2), ",", ";")
'Joins the temp array with quotes and then splits the result using the semicolons
splitLine1 = Split(Join(temp, Chr(34)), ";")
End Function
This function only solves this particular problem. Another way to do the job is using the regular expression object from VBScript.
此功能仅解决此特定问题。另一种完成这项工作的方法是使用来自 VBScript 的正则表达式对象。
Function splitLine2(line As String) As String()
Dim regex As Object
Set regex = CreateObject("vbscript.regexp")
regex.IgnoreCase = True
regex.Global = True
'This pattern matches only commas outside quotes
'Pattern = ",(?=([^"]*"[^"]*")*(?![^"]*"))"
regex.Pattern = ",(?=([^" & Chr(34) & "]*" & Chr(34) & "[^" & Chr(34) & "]*" & Chr(34) & ")*(?![^" & Chr(34) & "]*" & Chr(34) & "))"
'regex.replaces will replace the commas outside quotes with semicolons and then the
'Split function will split the result based on the semicollons
splitLine2 = Split(regex.Replace(line, ";"), ";")
End Function
This method seems much more cryptic, but does not deppends on the structure of the line
这种方法看起来更神秘,但不依赖于行的结构
You can read more about regular expressions patterns in VBScript Here
您可以在此处阅读有关 VBScript 中正则表达式模式的更多信息
回答by transistor1
@Gimp said...
@Gimp 说...
The current answers do not contain enough detail.
I'm running into the same problem. Looking for more detail in this answer.
目前的答案没有包含足够的细节。
我遇到了同样的问题。在这个答案中寻找更多细节。
To elaborate on @MRAB's answer:
详细说明@MRAB的回答:
Function ParseCSV(FileName)
Dim Regex 'As VBScript_RegExp_55.RegExp
Dim MatchColl 'As VBScript_RegExp_55.MatchCollection
Dim Match 'As VBScript_RegExp_55.Match
Dim FS 'As Scripting.FileSystemObject
Dim Txt 'As Scripting.TextStream
Dim CSVLine
ReDim ToInsert(0)
Set FS = CreateObject("Scripting.FileSystemObject")
Set Txt = FS.OpenTextFile(FileName, 1, False, -2)
Set Regex = CreateObject("VBScript.RegExp")
Regex.Pattern = """[^""]*""|[^,]*" '<- MRAB's answer
Regex.Global = True
Do While Not Txt.AtEndOfStream
ReDim ToInsert(0)
CSVLine = Txt.ReadLine
For Each Match In Regex.Execute(CSVLine)
If Match.Length > 0 Then
ReDim Preserve ToInsert(UBound(ToInsert) + 1)
ToInsert(UBound(ToInsert) - 1) = Match.Value
End If
Next
InsertArrayIntoDatabase ToInsert
Loop
Txt.Close
End Function
You need to customize the InsertArrayIntoDatabase Sub for your own table. Mine has several text fields named f00, f01, etc...
您需要为您自己的表自定义 InsertArrayIntoDatabase Sub。我的有几个名为 f00、f01 等的文本字段...
Sub InsertArrayIntoDatabase(a())
Dim rs As DAO.Recordset
Dim i, n
Set rs = CurrentDb().TableDefs("tbl").OpenRecordset()
rs.AddNew
For i = LBound(a) To UBound(a)
n = "f" & Format(i, "00") 'fields in table are f00, f01, f02, etc..
rs.Fields(n) = a(i)
Next
rs.Update
End Sub
Note that instead of using CurrentDb()
in InsertArrayIntoDatabase()
, you should really use a global variable that gets set to the value of CurrentDb()
beforeParseCSV()
runs, because running CurrentDb()
in a loop is very slow, especially on a very large file.
请注意,您应该真正使用一个全局变量,而不是使用CurrentDb()
in InsertArrayIntoDatabase()
,该变量设置为CurrentDb()
beforeParseCSV()
running的值,因为CurrentDb()
在循环中运行非常慢,尤其是在非常大的文件上。
回答by Fionnuala
If you are working with MS Access tables, there are advantages in simply importing text from disk. For example:
如果您正在使用 MS Access 表,那么简单地从磁盘导入文本是有好处的。例如:
''If you have a reference to the Windows Script Host Object Model
Dim fs As New FileSystemObject
Dim ts As TextStream
''For late binding
''Dim fs As Object
''Dim ts As Object
''Set fs=CreateObject("Scripting.FileSystemObject")
Set ts = fs.CreateTextFile("z:\docs\import.csv", True)
sData = "1,2,3,""This should,be one part"",5,6,7"
ts.Write sData
ts.Close
''Just for testing, your table will already exist
''sSQL = "Create table Imports (f1 int, f2 int, f3 int, f4 text, " _
'' & "f5 int, f6 int, f7 int)"
''CurrentDb.Execute sSQL
''The fields will be called F1,F2 ... Fn in the text file
sSQL = "INSERT INTO Imports SELECT * FROM " _
& "[text;fmt=delimited;hdr=no;database=z:\docs\].[import.csv]"
CurrentDb.Execute sSQL
回答by lilguy
I know this is an old post, but thought this may help others. This was plagiarized/revised from http://n3wt0n.com/blog/comma-separated-values-and-quoted-commas-in-vbscript/, but works really well and is set as a function that you can pass your input line to.
我知道这是一个旧帖子,但认为这可能对其他人有帮助。这是从http://n3wt0n.com/blog/comma-separated-values-and-quoted-commas-in-vbscript/抄袭/修改的,但效果很好,并且设置为可以传递输入行的函数到。
Function SplitCSVLineToArray(Line, RemoveQuotes) 'Pass it a line and whether or not to remove the quotes
ReplacementString = "#!#!#" 'Random String that we should never see in our file
LineLength = Len(Line)
InQuotes = False
NewLine = ""
For x = 1 to LineLength
CurrentCharacter = Mid(Line,x,1)
If CurrentCharacter = Chr(34) then
If InQuotes then
InQuotes = False
Else
InQuotes = True
End If
End If
If InQuotes Then
CurrentCharacter = Replace(CurrentCharacter, ",", ReplacementString)
End If
NewLine = NewLine & CurrentCharacter
Next
LineArray = split(NewLine,",")
For x = 0 to UBound(LineArray)
LineArray(x) = Replace(LineArray(x), ReplacementString, ",")
If RemoveQuotes = True then
LineArray(x) = Replace(LineArray(x), Chr(34), "")
End If
Next
SplitCSVLineToArray = LineArray
End Function
回答by Maryan Hutsul
I made another variant of solution for parsing CSV files with "quoted" text strings with possible delimiters, like comma inside the double quotes. This method doesn't require regex expressions, or any other addons. Also, this code deals with multiple commas in between the quotes. Here is Subroutine for testing:
我制作了另一种解决方案,用于解析带有可能分隔符的“引用”文本字符串的 CSV 文件,例如双引号内的逗号。此方法不需要正则表达式或任何其他插件。此外,此代码处理引号之间的多个逗号。这是用于测试的子程序:
Sub SubstituteBetweenQuotesSub()
'In-string character replacement function by Maryan Hutsul 1/29/2019
Dim quote, quoteTwo As Integer
Dim oddEven As Integer
Dim i, counter As Integer
Dim byteArray() As Byte
'LineItems are lines of text read from CSV file, or any other text string
LineItems = ",,,2019NoApocalypse.ditamap,[email protected],Approver,""JC, ,Son"",Reviewer,[email protected],""God, All-Mighty,"",2019-01-29T08:47:29.290-05:00"
quote = 1
oddEven = 0
Do Until quote = 0
quote = InStr(quote, LineItems, Chr(34))
quoteTwo = InStr(quote + 1, LineItems, Chr(34))
oddEven = oddEven + 1
If oddEven Mod 2 = 1 And quote <> 0 Then
counter = 0
For i = quote To quoteTwo
byteArray = StrConv(LineItems, vbFromUnicode)
If i <> 0 Then
If byteArray(i - 1) = 44 Then '44 represents comma, can also do Chr(44)
counter = counter + 1
End If
End If
Next i
LineItems = Left(LineItems, quote - 1) & Replace(LineItems, ",", ";", quote, counter)
quote = quote + 1
ElseIf quote <> 0 Then
quote = quote + 1
End If
Loop
End Sub
Here is function to which you can pass lines from .csv, .txt or any other text files:
这是您可以将 .csv、.txt 或任何其他文本文件中的行传递给的函数:
Function SubstituteBetweenQuotes(LineItems)
'In-string character replacement function by Maryan Hutsul 1/29/2019
'LineItems are lines of text read from CSV file, or any other text string
Dim quote, quoteTwo As Integer
Dim oddEven As Integer
Dim i, counter As Integer
Dim byteArray() As Byte
quote = 1
oddEven = 0
Do Until quote = 0
quote = InStr(quote, LineItems, Chr(34))
quoteTwo = InStr(quote + 1, LineItems, Chr(34))
oddEven = oddEven + 1
If oddEven Mod 2 = 1 And quote <> 0 Then
counter = 0
For i = quote To quoteTwo
byteArray = StrConv(LineItems, vbFromUnicode)
If i <> 0 Then
If byteArray(i - 1) = 44 Then '44 represents "," comma, can also do Chr(44)
counter = counter + 1
End If
End If
Next i
LineItems = Left(LineItems, quote - 1) & Replace(LineItems, ",", ";", quote, counter)
quote = quote + 1
ElseIf quote <> 0 Then
quote = quote + 1
End If
Loop
SubstituteBetweenQuotes = LineItems
End Function
And below is code for reading CSV file with function used:
以下是使用函数读取 CSV 文件的代码:
Dim fullFilePath As String
Dim i As Integer
'fullFilePath - full link to your input CSV file
Open fullFilePath For Input As #1
row_number = 0
column_number = 0
'EOF - End Of File (1) - file #1
Do Until EOF(1)
Line Input #1, LineFromFile
LineItems = Split(SubstituteBetweenQuotes(LineFromFile), ",")
For i = LBound(LineItems) To UBound(LineItems)
ActiveCell.Offset(row_number, i).Value = LineItems(i)
Next i
row_number = row_number + 1
Loop
Close #1
All delimiters and replacement character may be modified for your needs. I Hope this is useful as I had quite a journey to solve some problems with CSV imports
可以根据您的需要修改所有分隔符和替换字符。我希望这很有用,因为我花了很长时间来解决 CSV 导入的一些问题
回答by Brenton
We had a similar CSV parsing challenge in excel recently, and implemented a solution adapted from Javascript code to parse CSV data:
我们最近在excel中遇到了类似的CSV解析挑战,并实现了一个改编自Javascript代码的解决方案来解析CSV数据:
Function SplitCSV(csvText As String, delimiter As String) As String()
' Create a regular expression to parse the CSV values
Dim RegEx As New RegExp
' Create pattern which will match each column in the CSV, wih submatches for each of the groups in the regex
' Match Groups: Delimiter Quoted fields Standard fields
RegEx.Pattern = "(" + delimiter + "|^)(?:\""([^\""]*(?:\""\""[^\""]*)*)\""|([^\""\""" + delimiter + """]*))"
RegEx.Global = True
RegEx.IgnoreCase = True
' Create an array to hold all pattern matches (i.e. columns)
Dim Matches As MatchCollection
Set Matches = RegEx.Execute(csvText)
' Create an array to hold output data
Dim Output() As String
' Create int to track array location when iterating
Dim i As Integer
i = 0
' Manually add blank if first column is blank, since VBA regex misses this
If csvText Like ",*" Then
ReDim Preserve Output(i)
Output(i) = ""
i = i + 1
End If
' Iterate over all pattern matches and get values into output array
Dim Match As Match
Dim MatchedValue As String
For Each Match In Matches
' Check to see which kind of value we captured (quoted or unquoted)
If (Len(Match.SubMatches(1)) > 0) Then
' We found a quoted value. When we capture this value, unescape any double quotes
MatchedValue = Replace(Match.SubMatches(1), """""", """")
Else
' We found a non-quoted value
MatchedValue = Match.SubMatches(2)
End If
' Now that we have our value string, let's add it to the data array
ReDim Preserve Output(i)
Output(i) = MatchedValue
i = i + 1
Next Match
' Return the parsed data
SplitCSV = Output
End Function
回答by Tom Scott
I realize this is an old post, but I just bumped into it looking for a solution to the same problem the OP had, so the thread is still relevant.
我意识到这是一篇旧帖子,但我只是碰到它,寻找 OP 遇到的相同问题的解决方案,因此该线程仍然相关。
To import data from a CSV, I add a query to a worksheet
要从 CSV 导入数据,我将查询添加到工作表
wksTarget.Querytables.add(Connection:=strConn, Destination:=wksTarget.Range("A1"))
then set the appropriate Querytable parameters (e.g. Name, FieldNames, RefreshOnOpen
, etc.)
然后设置适当的 Querytable 参数(例如Name, FieldNames, RefreshOnOpen
,等)
Querytables can handle various delimiters via the TextFileCommaDelimiter
, TextFileSemiColonDelimiter
and others. And there are a number of other parameters (TextfilePlatform, TextFileTrailingMinusNumbers, TextFileColumnTypes, TextFileDecimalSeparator, TextFileStartRow, TextFileThousandsSeparator
) that handle source file idiosyncrasies.
查询表可以通过TextFileCommaDelimiter
,TextFileSemiColonDelimiter
和其他方式处理各种分隔符。还有许多其他参数 ( TextfilePlatform, TextFileTrailingMinusNumbers, TextFileColumnTypes, TextFileDecimalSeparator, TextFileStartRow, TextFileThousandsSeparator
) 处理源文件特性。
Relevant to the OP, QueryTables also has a parameter designed to handle commas that are within double quotes - TextFileQualifier = xlTextQualifierDoubleQuote
.
与 OP 相关,QueryTables 也有一个参数,旨在处理双引号内的逗号 - TextFileQualifier = xlTextQualifierDoubleQuote
。
I find QueryTables much simpler than writing code to import the file, split/parse strings or use REGEX expressions.
我发现 QueryTables 比编写代码来导入文件、拆分/解析字符串或使用 REGEX 表达式要简单得多。
All together, a sample code snippet would look something like this:
总之,一个示例代码片段看起来像这样:
strConn = "TEXT;" & "C:\Desktop\SourceFile.CSV"
varDataTypes = Array(5, 1, 1, 1, 1, 1, 5, 5)
With wksTarget.QueryTables.Add(Connection:=strConn, _
Destination:=wksTarget.Range("A1"))
.Name = "ImportCSV"
.FieldNames = True
.RefreshOnFileOpen = False
.SaveData = True
.TextFilePlatform = xlMSDOS
.TextFileStartRow = 1
.TextFileParseType = xlDelimited
.TextFileCommaDelimiter = True
.TextFileTextQualifier = xlTextQualifierDoubleQuote
.TextFileColumnDataTypes = varDataTypes
.Refresh BackgroundQuery:=False
End With
I prefer to delete the QueryTable once the data is imported (wksTarget.QueryTable("ImportCSV").Delete
), but I suppose it could be created just once and then simply refreshed if the source and destinations for the data don't change.
我更喜欢在导入数据后删除 QueryTable ( wksTarget.QueryTable("ImportCSV").Delete
),但我想它可以只创建一次,然后如果数据的源和目标没有改变,只需刷新。
回答by Eddy
Taking your comments into account you could take the easy way out here
考虑到您的意见,您可以在这里采取简单的方法
- split on " --> gives you 3 or more entries (could be more due to doublequotes inside the string literal)
- split first part on ,
- keep part 2 to n-1 together (is your string literal)
- split the last part on ,
- split on " --> 为您提供 3 个或更多条目(由于字符串文字中的双引号,可能更多)
- 拆分第一部分,
- 将第 2 部分与 n-1 放在一起(是您的字符串文字)
- 拆分最后一部分,
回答by ntselama
Try This! Make sure to have the "Microsoft VBScript Regular Expressions 5.5" ticked on References under Tools.
尝试这个!确保在“工具”下的“参考”上勾选“Microsoft VBScript 正则表达式 5.5”。
Function Splitter(line As String, n As Integer)
Dim s() As String
Dim regex As Object
Set regex = CreateObject("vbscript.regexp")
regex.IgnoreCase = True
regex.Global = True
regex.Pattern = ",(?=([^\""]*\""[^\""]*\"")*[^\""]*$)"
s = split(regex.Replace(line, "|/||\|"), "|/||\|")
Splitter = s(n - 1)
End Function