C# 如何一次读取一行 csv 文件并随时替换/编辑某些行?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/13985700/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to read a csv file one line at a time and replace/edit certain lines as you go?
提问by richard
I have a 60GB csv file I need to make some modifications to. The customer wants some changes to the files data, but I don't want to regenerate the data in that file because it took 4 days to do.
我有一个 60GB 的 csv 文件,我需要对其进行一些修改。客户想要对文件数据进行一些更改,但我不想重新生成该文件中的数据,因为这需要 4 天的时间。
How can I read the file, line by line (not loading it all into memory!), and make edits to those lines as I go, replacing certain values etc.?
如何逐行读取文件(而不是将其全部加载到内存中!),并在我进行时对这些行进行编辑,替换某些值等?
采纳答案by moribvndvs
The process would be something like this:
该过程将是这样的:
- Open a
StreamWriter
to a temporary file. - Open a
StreamReader
to the target file. - For each line:
- Split the text into columns based on a delimiter.
- Check the columns for the values you want to replace, and replace them.
- Join the column values back together using your delimiter.
- Write the line to the temporary file.
- When you are finished, delete the target file, and move the temporary file to the target file path.
- 打开一个
StreamWriter
临时文件。 - 打开
StreamReader
目标文件。 - 对于每一行:
- 根据分隔符将文本拆分为列。
- 检查要替换的值的列,然后替换它们。
- 使用分隔符将列值重新连接在一起。
- 将该行写入临时文件。
- 完成后,删除目标文件,并将临时文件移动到目标文件路径。
Note regarding Steps 2 and 3.1: If you are confident in the structure of your file and it is simple enough, you can do all this out of the box as described (I'll include a sample in a moment). However, there are factors in a CSV file that may need attention (such as recognizing when a delimiter is being used literally in a column value). You can drudge through this yourself, or try an existing solution.
关于第 2 步和第 3.1 步的注意事项:如果您对文件的结构有信心并且它足够简单,您可以按照所述开箱即用地执行所有这些操作(我稍后会提供一个示例)。但是,可能需要注意 CSV 文件中的一些因素(例如识别何时在列值中按字面使用分隔符)。您可以自己努力解决这个问题,或者尝试现有的解决方案。
Basic example just using StreamReader
and StreamWriter
:
仅使用StreamReader
and 的基本示例StreamWriter
:
var sourcePath = @"C:\data.csv";
var delimiter = ",";
var firstLineContainsHeaders = true;
var tempPath = Path.GetTempFileName();
var lineNumber = 0;
var splitExpression = new Regex(@"(" + delimiter + @")(?=(?:[^""]|""[^""]*"")*$)");
using (var writer = new StreamWriter(tempPath))
using (var reader = new StreamReader(sourcePath))
{
string line = null;
string[] headers = null;
if (firstLineContainsHeaders)
{
line = reader.ReadLine();
lineNumber++;
if (string.IsNullOrEmpty(line)) return; // file is empty;
headers = splitExpression.Split(line).Where(s => s != delimiter).ToArray();
writer.WriteLine(line); // write the original header to the temp file.
}
while ((line = reader.ReadLine()) != null)
{
lineNumber++;
var columns = splitExpression.Split(line).Where(s => s != delimiter).ToArray();
// if there are no headers, do a simple sanity check to make sure you always have the same number of columns in a line
if (headers == null) headers = new string[columns.Length];
if (columns.Length != headers.Length) throw new InvalidOperationException(string.Format("Line {0} is missing one or more columns.", lineNumber));
// TODO: search and replace in columns
// example: replace 'v' in the first column with '\/': if (columns[0].Contains("v")) columns[0] = columns[0].Replace("v", @"\/");
writer.WriteLine(string.Join(delimiter, columns));
}
}
File.Delete(sourcePath);
File.Move(tempPath, sourcePath);
回答by Junnan Wang
memory-mapped files is a new feature in .NET Framework 4 which can be used to edit large files. read here http://msdn.microsoft.com/en-us/library/dd997372.aspxor google Memory-mapped files
内存映射文件是 .NET Framework 4 中的一项新功能,可用于编辑大文件。在这里阅读http://msdn.microsoft.com/en-us/library/dd997372.aspx或谷歌内存映射文件
回答by Nicolai
Just read the file, line by line, with streamreader, and then use REGEX! The most amazing tool in the world.
只需使用 streamreader 逐行读取文件,然后使用 REGEX!世界上最神奇的工具。
using (var sr = new StreamReader(new FileStream(@"C:\temp\file.csv", FileMode.Open)))
{
var line = sr.ReadLine();
while (!sr.EndOfStream)
{
// do stuff
line = sr.ReadLine();
}
}