C# 如何一次读取一行 csv 文件并随时替换/编辑某些行?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/13985700/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-10 10:15:49  来源:igfitidea点击:

How to read a csv file one line at a time and replace/edit certain lines as you go?

c#.netparsingcsv

提问by richard

I have a 60GB csv file I need to make some modifications to. The customer wants some changes to the files data, but I don't want to regenerate the data in that file because it took 4 days to do.

我有一个 60GB 的 csv 文件,我需要对其进行一些修改。客户想要对文件数据进行一些更改,但我不想重新生成该文件中的数据,因为这需要 4 天的时间。

How can I read the file, line by line (not loading it all into memory!), and make edits to those lines as I go, replacing certain values etc.?

如何逐行读取文件(而不是将其全部加载到内存中!),并在我进行时对这些行进行编辑,替换某些值等?

采纳答案by moribvndvs

The process would be something like this:

该过程将是这样的:

  1. Open a StreamWriterto a temporary file.
  2. Open a StreamReaderto the target file.
  3. For each line:
    1. Split the text into columns based on a delimiter.
    2. Check the columns for the values you want to replace, and replace them.
    3. Join the column values back together using your delimiter.
    4. Write the line to the temporary file.
  4. When you are finished, delete the target file, and move the temporary file to the target file path.
  1. 打开一个StreamWriter临时文件。
  2. 打开StreamReader目标文件。
  3. 对于每一行:
    1. 根据分隔符将文本拆分为列。
    2. 检查要替换的值的列,然后替换它们。
    3. 使用分隔符将列值重新连接在一起。
    4. 将该行写入临时文件。
  4. 完成后,删除目标文件,并将临时文件移动到目标文件路径。

Note regarding Steps 2 and 3.1: If you are confident in the structure of your file and it is simple enough, you can do all this out of the box as described (I'll include a sample in a moment). However, there are factors in a CSV file that may need attention (such as recognizing when a delimiter is being used literally in a column value). You can drudge through this yourself, or try an existing solution.

关于第 2 步和第 3.1 步的注意事项:如果您对文件的结构有信心并且它足够简单,您可以按照所述开箱即用地执行所有这些操作(我稍后会提供一个示例)。但是,可能需要注意 CSV 文件中的一些因素(例如识别何时在列值中按字面使用分隔符)。您可以自己努力解决这个问题,或者尝试现有的解决方案



Basic example just using StreamReaderand StreamWriter:

仅使用StreamReaderand 的基本示例StreamWriter

var sourcePath = @"C:\data.csv";
var delimiter = ",";
var firstLineContainsHeaders = true;
var tempPath = Path.GetTempFileName();
var lineNumber = 0;

var splitExpression = new Regex(@"(" + delimiter + @")(?=(?:[^""]|""[^""]*"")*$)");

using (var writer = new StreamWriter(tempPath))
using (var reader = new StreamReader(sourcePath))
{
    string line = null;
    string[] headers = null;
    if (firstLineContainsHeaders)
    {
        line = reader.ReadLine();
        lineNumber++;

        if (string.IsNullOrEmpty(line)) return; // file is empty;

        headers = splitExpression.Split(line).Where(s => s != delimiter).ToArray();

        writer.WriteLine(line); // write the original header to the temp file.
    }

    while ((line = reader.ReadLine()) != null)
    {
        lineNumber++;

        var columns = splitExpression.Split(line).Where(s => s != delimiter).ToArray();

        // if there are no headers, do a simple sanity check to make sure you always have the same number of columns in a line
        if (headers == null) headers = new string[columns.Length];

        if (columns.Length != headers.Length) throw new InvalidOperationException(string.Format("Line {0} is missing one or more columns.", lineNumber));

        // TODO: search and replace in columns
        // example: replace 'v' in the first column with '\/': if (columns[0].Contains("v")) columns[0] = columns[0].Replace("v", @"\/");

        writer.WriteLine(string.Join(delimiter, columns));
    }

}

File.Delete(sourcePath);
File.Move(tempPath, sourcePath);

回答by Junnan Wang

memory-mapped files is a new feature in .NET Framework 4 which can be used to edit large files. read here http://msdn.microsoft.com/en-us/library/dd997372.aspxor google Memory-mapped files

内存映射文件是 .NET Framework 4 中的一项新功能,可用于编辑大文件。在这里阅读http://msdn.microsoft.com/en-us/library/dd997372.aspx或谷歌内存映射文件

回答by Nicolai

Just read the file, line by line, with streamreader, and then use REGEX! The most amazing tool in the world.

只需使用 streamreader 逐行读取文件,然后使用 REGEX!世界上最神奇的工具。

using (var sr = new StreamReader(new FileStream(@"C:\temp\file.csv", FileMode.Open)))
        {
            var line = sr.ReadLine();
            while (!sr.EndOfStream)
            {
                // do stuff

                line = sr.ReadLine();
            }

        }