我如何在 C# 中解析文本文件

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/300671/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-03 22:16:42  来源:igfitidea点击:

How do i parse a text file in c#

c#parsingtext-files

提问by

How do i parse a text file in c#?

我如何在 C# 中解析文本文件?

回答by BCS

If you have more than a trivial language, use a parser generator. It drove menuts but I've heard good things about ANTLR(Note: get the manual and read it before you start. If you have used a parser generator other than it before you will not approach it correctly right off the bat, at least I didn't)

如果您拥有的不仅仅是一种简单的语言,请使用解析器生成器。这让我很抓狂,但我听说过关于ANTLR 的好消息(注意:在开始之前获取手册并阅读它。如果您之前使用过除此之外的解析器生成器,那么您将无法立即正确地使用它,至少我没有)

Other tools also exist.

其他工具也存在。

回答by Jim Burger

Without really knowing what sort of text file you're on about, its hard to answer. However, the FileHelperslibrary has a broad set of tools to help with fixed length file formats, multirecord, delimited etc.

在不真正知道您使用的是哪种文本文件的情况下,很难回答。但是,FileHelpers库有一组广泛的工具来帮助处理固定长度的文件格式、多记录、分隔等。

回答by tsimon

What do you mean by parse? Parse usually means to split the input into tokens, which you might do if you're trying to implement a programming language. If you're just wanting to read the contents of a text file, look at System.IO.FileInfo.

你说的解析是什么意思?解析通常意味着将输入拆分为标记,如果您试图实现一种编程语言,您可能会这样做。如果您只想阅读文本文件的内容,请查看 System.IO.FileInfo。

回答by Alan

The algorithm might look like this:

该算法可能如下所示:

  1. Open Text File
  2. For every line in the file:
  3. Parse Line
  1. 打开文本文件
  2. 对于文件中的每一行:
  3. 解析线

There are several approaches to parsing a line.

有几种方法可以解析一行。

The easiest from a beginner standpoint is to use the String methods.

从初学者的角度来看,最简单的方法是使用 String 方法。

System.String at MSDN

MSDN 上的 System.String

If you are up for more of a challenge, then you can use the System.Text.RegularExpression library to parse your text.

如果您准备接受更多挑战,那么您可以使用 System.Text.RegularExpression 库来解析您的文本。

RegEx at MSDN

MSDN 上的正则表达式

回答by CMS

Check this interesting approach, Linq To Text Files, very nice, you only need a IEnumerable<string>method, that yields every file.ReadLine(), and you do the query.

检查这个有趣的方法,Linq To Text Files,非常好,你只需要一个IEnumerable<string>方法,产生每个file.ReadLine(),然后你做查询。

Hereis another article that better explains the same technique.

是另一篇文章,它更好地解释了相同的技术。

回答by Perica Zivkovic

using (TextReader rdr = new StreamReader(fullFilePath))
{
  string line;

  while ((line = rdr.ReadLine()) != null)
  {
    // use line here
  }
}

set the variable "fullFilePath" to the full path eg. C:\temp\myTextFile.txt

将变量“fullFilePath”设置为完整路径,例如。C:\temp\myTextFile.txt

回答by Coderer

A small improvement on Pero's answer:

佩罗回答的一个小改进:

FileInfo txtFile = new FileInfo("c:\myfile.txt");
if(!txtFile.Exists) { // error handling }

using (TextReader rdr = txtFile.OpenText())
{
     // use the text file as Pero suggested
}

The FileInfo class gives you the opportunity to "do stuff" with the file before you actually start reading from it. You can also pass it around between functions as a better abstraction of the file's location (rather than using the full path string). FileInfo canonicalizes the path so it's absolutely correct (e.g. turning / into \ where appropriate) and lets you extract extra data about the file -- parent directory, extension, name only, permissions, etc.

FileInfo 类使您有机会在实际开始读取文件之前对其进行“处理”。您还可以在函数之间传递它作为文件位置的更好抽象(而不是使用完整路径字符串)。FileInfo 将路径规范化,因此它绝对正确(例如,在适当的情况下将 / 转换为 \),并允许您提取有关文件的额外数据——父目录、扩展名、仅名称、权限等。

回答by Coderer

To begin with, make sure that you have the following namespaces:

首先,请确保您具有以下命名空间:

using System.Data;
using System.IO;
using System.Text.RegularExpressions;

Next, we build a function that parses any CSV input string into a DataTable:

接下来,我们构建一个将任何 CSV 输入字符串解析为 DataTable 的函数:

public DataTable ParseCSV(string inputString) {

  DataTable dt=new DataTable();

  // declare the Regular Expression that will match versus the input string
  Regex re=new Regex("((?<field>[^\",\r\n]+)|\"(?<field>([^\"]|\"\")+)\")(,|(?<rowbreak>\r\n|\n|$))");

  ArrayList colArray=new ArrayList();
  ArrayList rowArray=new ArrayList();

  int colCount=0;
  int maxColCount=0;
  string rowbreak="";
  string field="";

  MatchCollection mc=re.Matches(inputString);

  foreach(Match m in mc) {

    // retrieve the field and replace two double-quotes with a single double-quote
    field=m.Result("${field}").Replace("\"\"","\"");

    rowbreak=m.Result("${rowbreak}");

    if (field.Length > 0) {
      colArray.Add(field);                  
      colCount++;
    }

    if (rowbreak.Length > 0) {

      // add the column array to the row Array List
      rowArray.Add(colArray.ToArray());

      // create a new Array List to hold the field values
      colArray=new ArrayList(); 

      if (colCount > maxColCount)
        maxColCount=colCount;

      colCount=0;
    }
  }

  if (rowbreak.Length == 0) {
    // this is executed when the last line doesn't
    // end with a line break
    rowArray.Add(colArray.ToArray());
    if (colCount > maxColCount)
      maxColCount=colCount;
  }

  // create the columns for the table
  for(int i=0; i < maxColCount; i++)
  dt.Columns.Add(String.Format("col{0:000}",i));

  // convert the row Array List into an Array object for easier access
  Array ra=rowArray.ToArray();
  for(int i=0; i < ra.Length; i++) {                

    // create a new DataRow
    DataRow dr=dt.NewRow();

    // convert the column Array List into an Array object for easier access
    Array ca=(Array)(ra.GetValue(i));               

    // add each field into the new DataRow
    for(int j=0; j < ca.Length; j++)
      dr[j]=ca.GetValue(j);

    // add the new DataRow to the DataTable
    dt.Rows.Add(dr);
  }

  // in case no data was parsed, create a single column
  if (dt.Columns.Count == 0)
    dt.Columns.Add("NoData");

  return dt;
}

Now that we have a parser for converting a string into a DataTable, all we need now is a function that will read the content from a CSV file and pass it to our ParseCSV function:

现在我们有了一个将字符串转换为 DataTable 的解析器,我们现在需要的是一个函数,该函数将从 CSV 文件中读取内容并将其传递给我们的 ParseCSV 函数:

public DataTable ParseCSVFile(string path) {

  string inputString="";

  // check that the file exists before opening it
  if (File.Exists(path)) {

    StreamReader sr = new StreamReader(path);
    inputString = sr.ReadToEnd();
    sr.Close();

  }

  return ParseCSV(inputString);
}

And now you can easily fill a DataGrid with data coming off the CSV file:

现在您可以轻松地使用来自 CSV 文件的数据填充 DataGrid:

protected System.Web.UI.WebControls.DataGrid DataGrid1;

private void Page_Load(object sender, System.EventArgs e) {

  // call the parser
  DataTable dt=ParseCSVFile(Server.MapPath("./demo.csv"));          

  // bind the resulting DataTable to a DataGrid Web Control
  DataGrid1.DataSource=dt;
  DataGrid1.DataBind();
}

Congratulations! You are now able to parse CSV into a DataTable. Good luck with your programming.

恭喜!您现在可以将 CSV 解析为 DataTable。祝你编程好运。

回答by Jonathan Wood

You might want to use a helper class such as the one described at http://www.blackbeltcoder.com/Articles/strings/a-text-parsing-helper-class.

您可能想要使用帮助类,例如http://www.blackbeltcoder.com/Articles/strings/a-text-parsing-helper-class 中描述的类。

回答by Ted Spence

From years of analyzing CSV files, including ones that are broken or have edge cases, here is my code that passes virtually all of my unit tests:

经过多年的 CSV 文件分析,包括那些损坏或有边缘情况的文件,以下是我几乎通过所有单元测试的代码:

/// <summary>
/// Read in a line of text, and use the Add() function to add these items to the current CSV structure
/// </summary>
/// <param name="s"></param>
public static bool TryParseCSVLine(string s, char delimiter, char text_qualifier, out string[] array)
{
    bool success = true;
    List<string> list = new List<string>();
    StringBuilder work = new StringBuilder();
    for (int i = 0; i < s.Length; i++) {
        char c = s[i];

        // If we are starting a new field, is this field text qualified?
        if ((c == text_qualifier) && (work.Length == 0)) {
            int p2;
            while (true) {
                p2 = s.IndexOf(text_qualifier, i + 1);

                // for some reason, this text qualifier is broken
                if (p2 < 0) {
                    work.Append(s.Substring(i + 1));
                    i = s.Length;
                    success = false;
                    break;
                }

                // Append this qualified string
                work.Append(s.Substring(i + 1, p2 - i - 1));
                i = p2;

                // If this is a double quote, keep going!
                if (((p2 + 1) < s.Length) && (s[p2 + 1] == text_qualifier)) {
                    work.Append(text_qualifier);
                    i++;

                    // otherwise, this is a single qualifier, we're done
                } else {
                    break;
                }
            }

            // Does this start a new field?
        } else if (c == delimiter) {
            list.Add(work.ToString());
            work.Length = 0;

            // Test for special case: when the user has written a casual comma, space, and text qualifier, skip the space
            // Checks if the second parameter of the if statement will pass through successfully
            // e.g. "bob", "mary", "bill"
            if (i + 2 <= s.Length - 1) {
                if (s[i + 1].Equals(' ') && s[i + 2].Equals(text_qualifier)) {
                    i++;
                }
            }
        } else {
            work.Append(c);
        }
    }
    list.Add(work.ToString());

    // If we have nothing in the list, and it's possible that this might be a tab delimited list, try that before giving up
    if (list.Count == 1 && delimiter != DEFAULT_TAB_DELIMITER) {
        string[] tab_delimited_array = ParseLine(s, DEFAULT_TAB_DELIMITER, DEFAULT_QUALIFIER);
        if (tab_delimited_array.Length > list.Count) {
            array = tab_delimited_array;
            return success;
        }
    }

    // Return the array we parsed
    array = list.ToArray();
    return success;
}

However, this function does not actually parse every valid CSV file out there! Some files have embedded newlines in them, and you need to enable your stream reader to parse multiple lines together to return an array. Here's a tool that does that:

然而,这个函数实际上并没有解析每个有效的 CSV 文件!一些文件在其中嵌入了换行符,您需要让流阅读器将多行一起解析以返回一个数组。这是一个可以做到这一点的工具:

/// <summary>
/// Parse a line whose values may include newline symbols or CR/LF
/// </summary>
/// <param name="sr"></param>
/// <returns></returns>
public static string[] ParseMultiLine(StreamReader sr, char delimiter, char text_qualifier)
{
    StringBuilder sb = new StringBuilder();
    string[] array = null;
    while (!sr.EndOfStream) {

        // Read in a line
        sb.Append(sr.ReadLine());

        // Does it parse?
        string s = sb.ToString();
        if (TryParseCSVLine(s, delimiter, text_qualifier, out array)) {
            return array;
        }
    }

    // Fails to parse - return the best array we were able to get
    return array;
}

For reference, I placed my open source CSV code on code.google.com.

作为参考,我将我的开源 CSV 代码放在 code.google.com 上