C# 分隔字符串解析?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/10205/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-07-31 17:35:01  来源:igfitidea点击:

Delimited string parsing?

提问by

I'm looking at parsing a delimited string, something on the order of

我在看解析一个分隔的字符串,顺序是

a,b,c

一、二、三

But this is a very simple example, and parsing delimited data can get complex; for instance

但这是一个非常简单的示例,解析分隔数据可能会变得复杂;例如

1,"Your simple algorithm, it fails",True

1,“你的简单算法,失败了”,True

would blow your naiive string.Split implementation to bits. Is there anything I can freely use/steal/copy and paste that offers a relatively bulletproof solution to parsing delimited text? .NET, plox.

会把你天真的 string.Split 实现吹到位。有什么我可以自由使用/窃取/复制和粘贴的东西,它为解析分隔文本提供了一种相对防弹的解决方案?.NET,plox。

Update:I decided to go with the TextFieldParser, which is part of VB.NET's pile of goodies hidden away in Microsoft.VisualBasic.DLL.

更新:我决定使用TextFieldParser,它是隐藏在 Microsoft.VisualBasic.DLL 中的 VB.NET 一堆好东西的一部分。

采纳答案by Jedi Master Spooky

I use this to read from a file

我用它从文件中读取

string filename = @textBox1.Text;
string[] fields;
string[] delimiter = new string[] {"|"};
using (Microsoft.VisualBasic.FileIO.TextFieldParser parser =
       new Microsoft.VisualBasic.FileIO.TextFieldParser(filename)) {
    parser.Delimiters = delimiter;
    parser.HasFieldsEnclosedInQuotes = false;

    while (!parser.EndOfData) {
        fields = parser.ReadFields();
        //Do what you need
    }
}

I am sure someone here can transform this to parser a string that is in memory.

我相信这里有人可以将其转换为解析内存中的字符串。

回答by Vaibhav

I am thinking that a generic framework would need to specify between two things: 1. What are the delimiting characters. 2. Under what condition do those characters not count (such as when they are between quotes).

我认为通用框架需要在两件事之间进行指定: 1. 什么是定界字符。2. 在什么情况下这些字符不计算在内(例如当它们在引号之间时)。

I think it may just be better off writing custom logic for every time you need to do something like this.

我认为每次你需要做这样的事情时,最好编写自定义逻辑。

回答by Michael Stum

I am not aware of any framework, but a simple state machine works:

我不知道任何框架,但一个简单的状态机工作:

  • State 1: Read every char until you hit a " or a ,
    • In case of a ": Move to State 2
    • In case of a ,: Move to State 3
    • In case of the end of file: Move to state 4
  • State 2: Read every char until you hit a "
    • In case of a ": Move to State 1
    • In case of the end of the file: Either Move to State 4 or signal an error because of an unterminated string
  • State 3: Add the current buffer to the output array, move the cursor forward behind the , and back to State 1.
  • State 4: this is the final state, does nothing except returning the output array.
  • 状态 1:读取每个字符,直到遇到 " 或 a ,
    • 在“的情况下:移动到状态 2
    • 在 a 的情况下:移动到状态 3
    • 在文件结束的情况下:移动到状态 4
  • 状态 2:读取每个字符,直到遇到“
    • 在“的情况下:移动到状态 1
    • 在文件结束的情况下:移动到状态 4 或由于未终止的字符串而发出错误信号
  • 状态 3:将当前缓冲区添加到输出数组中,将光标向前移动到 后面,然后返回到状态 1。
  • 状态 4:这是最终状态,除了返回输出数组之外什么都不做。

回答by Patrick McElhaney

There are some good answers here: Split a string ignoring quoted sections

这里有一些很好的答案:Split a string ignoring Quoting Sections

You might want to rephrase your question to something more precise (e.g. What code snippet or library I can use to parse CSV data in .NET?).

您可能想将您的问题重新表述为更精确的内容(例如,我可以使用什么代码片段或库来解析 .NET 中的 CSV 数据?)。

回答by Stu

Such as

var elements = new List<string>();
var current = new StringBuilder();
var p = 0;

while (p < internalLine.Length) {
    if (internalLine[p] == '"') {
        p++;

        while (internalLine[p] != '"') {
            current.Append(internalLine[p]);
            p++;
        }

        // Skip past last ',
        p += 2;
    }
    else {
        while ((p < internalLine.Length) && (internalLine[p] != ',')) {
            current.Append(internalLine[p]);
            p++;
        }

        // Skip past ,
        p++;
    }

    elements.Add(current.ToString());
    current.Length = 0;
}

回答by Keith

Simplest way is just to split the string into a char array and look for your string determiners and split char.

最简单的方法就是将字符串拆分为一个字符数组并查找您的字符串确定器和拆分字符。

It should be relatively easy to unit test.

单元测试应该相对容易。

You can wrap it in an extension method similar to the basic .Spilt method.

您可以将其包装在类似于基本 .Spilt 方法的扩展方法中。

回答by Dillie-O

To do a shameless plug, I've been working on a library for a while called fotelo(Formatted Text Loader) that I use to quickly parse large amounts of text based off of delimiter, position, or regex. For a quick string it is overkill, but if you're working with logs or large amounts, it may be just what you need. It works off a control file model similar to SQL*Loader (kind of the inspiration behind it).

为了做一个无耻的插件,我一直在研究一个名为fotelo(格式化文本加载器)的库,我用它来快速解析基于分隔符、位置或正则表达式的大量文本。对于快速字符串,它是矫枉过正的,但如果您正在处理日志或大量日志,它可能正是您所需要的。它使用类似于 SQL*Loader 的控制文件模型(它背后的灵感)。

回答by rohancragg

A very complrehesive library can be found here: FileHelpers

可以在这里找到一个非常复杂的库:FileHelpers

回答by gjvdkamp

Better late than never (add to the completeness of SO):

迟到总比不到好(增加 SO 的完整性):

http://www.codeproject.com/KB/database/CsvReader.aspx

http://www.codeproject.com/KB/database/CsvReader.aspx

This one ff-ing rules.

这是一个规则。