如何使用 C# split() 函数正确拆分 CSV?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/17207269/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to properly split a CSV using C# split() function?
提问by swdev
Suppose I have this CSV file :
假设我有这个 CSV 文件:
NAME,ADDRESS,DATE
"Eko S. Wibowo", "Tamanan, Banguntapan, Bantul, DIY", "6/27/1979"
I would like like to store each token that enclosed using a double quotes to be in an array, is there a safe to do this instead of using the String split() function? Currently I load up the file in a RichTextBox, and then using its Lines[] property, I do a loop for each Lines[] element and doing this :
我想将使用双引号括起来的每个标记存储在一个数组中,是否可以安全地执行此操作而不是使用 String split() 函数?目前我在 RichTextBox 中加载文件,然后使用它的 Lines[] 属性,我对每个 Lines[] 元素执行循环并执行以下操作:
string[] line = s.Split(',');
s is a reference to RichTextBox.Lines[]. And as you can clearly see, the comma inside a token can easily messed up split() function. So, instead of ended with three token as I want it, I ended with 6 tokens
s 是对 RichTextBox.Lines[] 的引用。您可以清楚地看到,标记中的逗号很容易弄乱 split() 函数。所以,我没有像我想要的那样以三个令牌结束,而是以 6 个令牌结束
Any help will be appreciated!
任何帮助将不胜感激!
采纳答案by unlimit
You could use regex too:
您也可以使用正则表达式:
string input = "\"Eko S. Wibowo\", \"Tamanan, Banguntapan, Bantul, DIY\", \"6/27/1979\"";
string pattern = @"""\s*,\s*""";
// input.Substring(1, input.Length - 2) removes the first and last " from the string
string[] tokens = System.Text.RegularExpressions.Regex.Split(
input.Substring(1, input.Length - 2), pattern);
This will give you:
这会给你:
Eko S. Wibowo
Tamanan, Banguntapan, Bantul, DIY
6/27/1979
回答by joe
I've done this with my own method. It simply counts the amout of "and 'characters.
Improve this to your needs.
我已经用我自己的方法做到了这一点。它只是计算"和'字符的数量。
根据您的需要改进这一点。
public List<string> SplitCsvLine(string s) {
int i;
int a = 0;
int count = 0;
List<string> str = new List<string>();
for (i = 0; i < s.Length; i++) {
switch (s[i]) {
case ',':
if ((count & 1) == 0) {
str.Add(s.Substring(a, i - a));
a = i + 1;
}
break;
case '"':
case '\'': count++; break;
}
}
str.Add(s.Substring(a));
return str;
}
回答by 0lukasz0
It's not an exact answer to your question, but why don't you use already written library to manipulate CSV file, good example would be LinqToCsv. CSV could be delimited with various punctuation signs. Moreover, there are gotchas, which are already addressed by library creators. Such as dealing with name row, dealing with different date formats and mapping rows to C# objects.
这不是您问题的确切答案,但是您为什么不使用已经编写好的库来操作 CSV 文件,很好的例子是LinqToCsv。CSV 可以用各种标点符号分隔。此外,还有一些问题,库创建者已经解决了这些问题。例如处理名称行、处理不同的日期格式以及将行映射到 C# 对象。
回答by TaSwavo
If your CSV line is tightly packed it's easiest to use the end and tail removal mentioned earlier and then a simple split on a joining string
如果您的 CSV 行紧凑,则最容易使用前面提到的结尾和尾部删除,然后在连接字符串上进行简单拆分
string[] tokens = input.Substring(1, input.Length - 2).Split("\",\"");
This will only work if ALL fields are double-quoted even if they don't (officially) need to be. It will be faster than RegEx but with given conditions as to its use.
这仅在所有字段都被双引号引用时才有效,即使它们(正式)不需要。它会比 RegEx 快,但在使用条件给定的情况下。
Really useful if your data looks like "Name","1","12/03/2018","Add1,Add2,Add3","other stuff"
如果您的数据看起来像 "Name","1","12/03/2018","Add1,Add2,Add3","other stuff" 真的很有用
回答by Abdul Hadi
You can replace ","with ;then split by ;
您可以替换","为;然后拆分;
var values= s.Replace("\",\"",";").Split(';');
回答by Etherman
Five years old but there is always somebody new who wants to split a CSV.
五年了,但总有新人想要拆分 CSV。
If your data is simple and predictable (i.e. never has any special characters like commas, quotes and newlines) then you can do it with split() or regex.
如果您的数据简单且可预测(即从不包含任何特殊字符,如逗号、引号和换行符),那么您可以使用 split() 或正则表达式来完成。
But to support all the nuances of the CSV format properly without code soup you should really use a library where all the magic has already been figured out. Don't re-invent the wheel (unless you are doing it for fun of course).
但是为了在没有代码汤的情况下正确支持 CSV 格式的所有细微差别,您真的应该使用一个库,其中所有的魔法都已经被弄清楚了。不要重新发明轮子(当然除非你是为了好玩而做)。
CsvHelper is simple enough to use:
CsvHelper 使用起来很简单:
https://joshclose.github.io/CsvHelper/2.x/
https://joshclose.github.io/CsvHelper/2.x/
using (var parser = new CsvParser(textReader)
{
while(true)
{
string[] line = parser.Read();
if (line != null)
{
// do something
}
else
{
break;
}
}
}
More discussion / same question: Dealing with commas in a CSV file
更多讨论/相同问题: 处理 CSV 文件中的逗号

