C# CSV 解析转义双引号
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/29907829/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
C# CSV parsing escaping double quotes
提问by nastassiar
I am trying to parse a number of CSV files that has double quotes and commas within the fields. I have no control over the format of the CSVs and instead of using "" to escape the quotes it is using \". The files are also extremely large so reading and using regex isn't the best option for me.
我正在尝试解析一些在字段中包含双引号和逗号的 CSV 文件。我无法控制 CSV 的格式,而不是使用 "" 来转义它使用的引号 \"。文件也非常大,因此阅读和使用正则表达式对我来说不是最佳选择。
I would prefer to use an existing library and notewrite an entirely new parser. Currently I am using CSVHelper
我更愿意使用现有的库并记下一个全新的解析器。目前我正在使用CSVHelper
This is an example of the CSV data:
这是 CSV 数据的示例:
"id","name","notes" "40","Continue","If the message \"Continue\" does not appear restart, and notify your instructor." "41","Restart","If the message \"Restart\" does not appear after 10 seconds, restart manually."
"id","name","notes" "40","Continue","如果没有出现“继续”信息,请重新开始,并通知你的导师。" "41","Restart","如果 10 秒后没有出现信息\"Restart\",请手动重启。"
The problem is the double quotes aren't being escaped properly and the , is being read as a delimiter and separating the notes field into 2 separate fields.
问题是双引号没有正确转义, , 被读取为分隔符并将注释字段分成 2 个单独的字段。
This is my current code that is doesn't work.
这是我当前的代码不起作用。
DataTable csvData = new DataTable();
string csvFilePath = @"C:\Users\" + csvFileName + ".csv";
try
{
FileInfo file = new FileInfo(csvFilePath);
using (TextReader reader = file.OpenText())
using (CsvReader csv = new CsvReader(reader))
{
csv.Configuration.Delimiter = ",";
csv.Configuration.HasHeaderRecord = true;
csv.Configuration.IgnoreQuotes = false;
csv.Configuration.TrimFields = true;
csv.Configuration.WillThrowOnMissingField = false;
string[] colFields = null;
while(csv.Read())
{
if (colFields == null)
{
colFields = csv.FieldHeaders;
foreach (string column in colFields)
{
DataColumn datacolumn = new DataColumn(column);
datacolumn.AllowDBNull = true;
csvData.Columns.Add(datacolumn);
}
}
string[] fieldData = csv.CurrentRecord;
for (int i = 0; i < fieldData.Length; i++)
{
if (fieldData[i] == "")
{
fieldData[i] = null;
}
}
csvData.Rows.Add(fieldData);
}
}
}
Is there an existing library that lets you specify how to escape quotes or should I just write my own parser?
是否有一个现有的库可以让您指定如何转义引号,或者我应该编写自己的解析器?
采纳答案by Alex
You can get quite far when using a very simple linq statement to splitand trimand finally Replacefor unescaping quotes in the content:
你可以用一个非常简单的LINQ语句时,你得到很远split,并trim最终Replace为在反向转义内容语录:
DataTable csvData = new DataTable();
string csvFilePath = @"C:\Users\" + csvFileName + ".csv";
try
{
string[] seps = { "\",", ",\"" };
char[] quotes = { '\"', ' ' };
string[] colFields = null;
foreach (var line in File.ReadLines(csvFilePath))
{
var fields = line
.Split(seps, StringSplitOptions.None)
.Select(s => s.Trim(quotes).Replace("\\"", "\""))
.ToArray();
if (colFields == null)
{
colFields = fields;
foreach (string column in colFields)
{
DataColumn datacolumn = new DataColumn(column);
datacolumn.AllowDBNull = true;
csvData.Columns.Add(datacolumn);
}
}
else
{
for (int i = 0; i < fields.Length; i++)
{
if (fields[i] == "")
{
fields[i] = null;
}
}
csvData.Rows.Add(fields);
}
}
}
When used in a very simple console app, and OPs original input in the "test.txt" file:
当在一个非常简单的控制台应用程序中使用时,并且在“test.txt”文件中操作原始输入:
public static void CsvUnescapeSplit()
{
string[] seps = { "\",", ",\"" };
char[] quotes = { '\"', ' ' };
foreach (var line in File.ReadLines(@"c:\temp\test.txt"))
{
var fields = line
.Split(seps, StringSplitOptions.None)
.Select(s => s.Trim(quotes).Replace("\\"", "\""))
.ToArray();
foreach (var field in fields)
Console.Write("{0} | ", field);
Console.WriteLine();
}
}
This produces the following (correct) output:
这会产生以下(正确的)输出:
id | name | notes |
40 | Continue | If the message "Continue" does not appear restart, and notify your instructor. |
41 | Help | If the message "Restart" does not appear after 10 seconds, manually restart. |
Caveat:If your field separators have spaces, like these:
警告:如果您的字段分隔符有空格,如下所示:
"40" , "Continue" , "If the message \"Continue\" does not appear restart, and notify your instructor."
Or your content strings contain commas directly after a quote, like here (after "Restart"):
或者您的内容字符串在引号后直接包含逗号,如下所示(在“重新启动”之后):
"41","Help","If the message \"Restart\", does not appear after 10 seconds, manually restart."
It will fail.
它会失败。

