.NET C# - 文本文件中的随机访问 - 不容易?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/265639/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
.NET C# - Random access in text files - no easy way?
提问by
I've got a text file that contains several 'records' inside of it. Each record contains a name and a collection of numbers as data.
我有一个文本文件,其中包含几个“记录”。每条记录都包含一个名称和一组数字作为数据。
I'm trying to build a class that will read through the file, present only the names of all the records, and then allow the user to select which record data he/she wants.
我正在尝试构建一个类,它将读取文件,仅显示所有记录的名称,然后允许用户选择他/她想要的记录数据。
The first time I go through the file, I only read header names, but I can keep track of the 'position' in the file where the header is. I need random access to the text file to seek to the beginning of each record after a user asks for it.
第一次浏览文件时,我只读取标题名称,但我可以跟踪文件中标题所在的“位置”。我需要随机访问文本文件以在用户请求后查找每条记录的开头。
I have to do it this way because the file is too large to be read in completely in memory (1GB+) with the other memory demands of the application.
我必须这样做,因为文件太大而无法在内存(1GB +)中完全读取应用程序的其他内存需求。
I've tried using the .NET StreamReader class to accomplish this (which provides very easy to use 'ReadLine' functionality, but there is no way to capture the true position of the file (the position in the BaseStream property is skewed due to the buffer the class uses).
我已尝试使用 .NET StreamReader 类来完成此操作(它提供了非常易于使用的“ReadLine”功能,但无法捕获文件的真实位置(BaseStream 属性中的位置由于类使用的缓冲)。
Is there no easy way to do this in .NET?
在 .NET 中没有简单的方法可以做到这一点吗?
回答by TcKs
You can use a System.IO.FileStream instead of StreamReader. If you know exactly, what file contains ( the encoding for example ), you can do all operation like with StreamReader.
您可以使用 System.IO.FileStream 而不是 StreamReader。如果您确切地知道包含什么文件(例如编码),您可以像使用 StreamReader 一样执行所有操作。
回答by James Curran
Are you sure that the file is "too large"? Have you tried it that way and has it caused a problem?
您确定文件“太大”了吗?您是否以这种方式尝试过并导致问题?
If you allocate a large amount of memory, and you aren't using it right now, Windows will just swap it out to disk. Hence, by accessing it from "memory", you will have accomplished what you want -- random access to the file on disk.
如果您分配了大量内存,而您现在没有使用它,Windows 只会将其换出到磁盘。因此,通过从“内存”访问它,您将完成您想要的——随机访问磁盘上的文件。
回答by LeppyR64
FileStream has the seek() method.
FileStream 有 seek() 方法。
回答by Jon Skeet
Is the encoding a fixed-size one (e.g. ASCII or UCS-2)? If so, you could keep track of the character index (based on the number of characters you've seen) and find the binary index based on that.
编码是固定大小的吗(例如 ASCII 或 UCS-2)?如果是这样,您可以跟踪字符索引(基于您看到的字符数)并根据该索引找到二进制索引。
Otherwise, no - you'd basically need to write your own StreamReader implementation which lets you peek at the binary index. It's a shame that StreamReader doesn't implement this, I agree.
否则,不 - 您基本上需要编写自己的 StreamReader 实现,它可以让您查看二进制索引。遗憾的是 StreamReader 没有实现这一点,我同意。
回答by reshefm
I think that the FileHelpers library runtime records feature might help u. http://filehelpers.sourceforge.net/runtime_classes.html
我认为 FileHelpers 库运行时记录功能可能对您有所帮助。http://filehelpers.sourceforge.net/runtime_classes.html
回答by Mike Blandford
This exact question was asked in 2006 here: http://www.devnewsgroups.net/group/microsoft.public.dotnet.framework/topic40275.aspx
这个确切的问题是在 2006 年在这里提出的:http: //www.devnewsgroups.net/group/microsoft.public.dotnet.framework/topic40275.aspx
Summary:
概括:
"The problem is that the StreamReader buffers data, so the value returned in BaseStream.Position property is always ahead of the actual processed line."
“问题在于 StreamReader 缓冲数据,因此 BaseStream.Position 属性中返回的值始终在实际处理的行之前。”
However, "if the file is encoded in a text encoding which is fixed-width, you could keep track of how much text has been read and multiply that by the width"
但是,“如果文件以固定宽度的文本编码进行编码,您可以跟踪读取了多少文本并将其乘以宽度”
and if not, you can just use the FileStream and read a char at a time and then the BaseStream.Position property should be correct
如果没有,您可以只使用 FileStream 并一次读取一个字符,然后 BaseStream.Position 属性应该是正确的
回答by Corbin March
If you're flexible with how the data file is written and don't mind it being a little less text editor-friendly, you could write your records with a BinaryWriter:
如果您对数据文件的编写方式很灵活,并且不介意它对文本编辑器的友好性稍差,则可以使用 BinaryWriter 编写记录:
using (BinaryWriter writer =
new BinaryWriter(File.Open("data.txt", FileMode.Create)))
{
writer.Write("one,1,1,1,1");
writer.Write("two,2,2,2,2");
writer.Write("three,3,3,3,3");
}
Then, initially reading each record is simple because you can use the BinaryReader's ReadString method:
然后,最初读取每条记录很简单,因为您可以使用 BinaryReader 的 ReadString 方法:
using (BinaryReader reader = new BinaryReader(File.OpenRead("data.txt")))
{
string line = null;
long position = reader.BaseStream.Position;
while (reader.PeekChar() > -1)
{
line = reader.ReadString();
//parse the name out of the line here...
Console.WriteLine("{0},{1}", position, line);
position = reader.BaseStream.Position;
}
}
The BinaryReader isn't buffered so you get the proper position to store and use later. The only hassle is parsing the name out of the line, which you may have to do with a StreamReader anyway.
BinaryReader 没有缓冲,因此您可以获得适当的位置以供以后存储和使用。唯一的麻烦是从行中解析名称,无论如何您可能必须使用 StreamReader。
回答by Jimmy
There are some good answers provided, but I couldn't find some source code that would work in my very simplistic case. Here it is, with the hope that it'll save someone else the hour that I spent searching around.
提供了一些很好的答案,但我找不到一些可以在我非常简单的情况下工作的源代码。就在这里,希望它能为其他人节省我花在四处寻找的时间。
The "very simplistic case" that I refer to is: the text encoding is fixed-width, and the line ending characters are the same throughout the file. This code works well in my case (where I'm parsing a log file, and I sometime have to seek ahead in the file, and then come back. I implemented just enough to do what I needed to do (ex: only one constructor, and only override ReadLine()), so most likely you'll need to add code... but I think it's a reasonable starting point.
我所指的“非常简单的情况”是:文本编码是固定宽度的,并且整个文件中的行结束字符是相同的。这段代码在我的情况下运行良好(我正在解析一个日志文件,有时我必须在文件中搜索,然后再回来。我实现了足以做我需要做的事情(例如:只有一个构造函数,并且只覆盖 ReadLine()),所以很可能你需要添加代码......但我认为这是一个合理的起点。
public class PositionableStreamReader : StreamReader
{
public PositionableStreamReader(string path)
:base(path)
{}
private int myLineEndingCharacterLength = Environment.NewLine.Length;
public int LineEndingCharacterLength
{
get { return myLineEndingCharacterLength; }
set { myLineEndingCharacterLength = value; }
}
public override string ReadLine()
{
string line = base.ReadLine();
if (null != line)
myStreamPosition += line.Length + myLineEndingCharacterLength;
return line;
}
private long myStreamPosition = 0;
public long Position
{
get { return myStreamPosition; }
set
{
myStreamPosition = value;
this.BaseStream.Position = value;
this.DiscardBufferedData();
}
}
}
Here's an example of how to use the PositionableStreamReader:
以下是如何使用 PositionableStreamReader 的示例:
PositionableStreamReader sr = new PositionableStreamReader("somepath.txt");
// read some lines
while (something)
sr.ReadLine();
// bookmark the current position
long streamPosition = sr.Position;
// read some lines
while (something)
sr.ReadLine();
// go back to the bookmarked position
sr.Position = streamPosition;
// read some lines
while (something)
sr.ReadLine();
回答by Jimmy
A couple of items that may be of interest.
一些可能感兴趣的项目。
1) If the lines are a fixed set of characters in length, that is not of necessity useful information if the character set has variable sizes (like UTF-8). So check your character set.
1) 如果行是一组长度固定的字符,那么如果字符集具有可变大小(如 UTF-8),则这不是有用的信息。所以检查你的字符集。
2) You can ascertain the exact position of the file cursor from StreamReader by using the BaseStream.Position value IFyou Flush() the buffers first (which will force the current position to be where the next read will begin - one byte after the last byte read).
2)您可以通过使用 BaseStream.Position 值从 StreamReader 确定文件光标的确切位置,如果您首先 Flush() 缓冲区(这将强制当前位置成为下一次读取将开始的位置 - 最后一个字节之后字节读取)。
3) If you know in advance that the exact length of each record will be the same number of characters, and the character set uses fixed-width characters (so each line is the same number of bytes long) the you can use FileStream with a fixed buffer size to match the size of a line and the position of the cursor at the end of each read will be, perforce, the beginning of the next line.
3)如果您事先知道每条记录的确切长度将是相同数量的字符,并且字符集使用固定宽度的字符(因此每行的字节数相同),您可以使用带有固定缓冲区大小以匹配一行的大小和每次读取结束时光标的位置,perforce,下一行的开头。
4) Is there any particular reason why, if the lines are the same length (assuming in bytes here) that you don't simply use line numbers and calculate the byte-offset in the file based on line size x line number?
4) 是否有任何特殊原因,如果行的长度相同(假设此处以字节为单位),您不只是使用行号并根据行大小 x 行号计算文件中的字节偏移量?