windows 最佳文件缓冲区读取大小?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1552107/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Optimum file buffer read size?
提问by Andrew Keith
I am writing an application which needs to read fairly large files. I have always wondered what's the optimum size for the read buffer on a modern Windows XP computer. I googled and found many examples which had 1024 as the optimum size.
我正在编写一个需要读取相当大文件的应用程序。我一直想知道现代 Windows XP 计算机上读取缓冲区的最佳大小是多少。我用谷歌搜索并找到了许多示例,其中 1024 是最佳大小。
Here is a snippet of what I mean:
这是我的意思的一个片段:
long pointer = 0;
buffer = new byte[1024]; // What's a good size here ?
while (pointer < input.Length)
{
pointer += input.Read(buffer, 0, buffer.Length);
}
My application is fairly simple, so I am not looking to write any benchmarking code, but would like to know what sizes are common?
我的应用程序相当简单,所以我不打算编写任何基准测试代码,但想知道哪些尺寸是常见的?
采纳答案by jrista
A 1k buffer size seems a bit small. Generally, there is no "one size fits all" buffer size. You need to set a buffer size that fits the behavior of your algorithm. Now, generally, its not a good idea to have a really huge buffer, but, having one that is too small or not in line with how you process each chunk is not that great either.
1k 缓冲区大小似乎有点小。通常,没有“一刀切”的缓冲区大小。您需要设置适合算法行为的缓冲区大小。现在,一般来说,拥有一个非常大的缓冲区并不是一个好主意,但是,拥有一个太小或与您处理每个块的方式不一致的缓冲区也不是那么好。
If you are simply reading data one chunk after another entirely into memory before processing it, I would use a larger buffer. I would probably use 8k or 16k, but probably not larger.
如果您只是在处理之前将数据一个接一个地完全读入内存,我会使用更大的缓冲区。我可能会使用 8k 或 16k,但可能不会更大。
On the other hand, if you are processing the data in streaming fashion, reading a chunk then processing it before reading the next, smaller buffers might be more useful. Even better, if you are streaming data that has structure, I would change the amount of data read to specifically match the type of data you are reading. For example, if you are reading binary data that contains a 4-character code, a float, and a string, I would read the 4-character code into a 4-byte array, as well as the float. I would read the length of the string, then create a buffer to read the whole chunk of string data at once.
另一方面,如果您以流方式处理数据,读取一个块然后在读取下一个之前处理它,较小的缓冲区可能更有用。更好的是,如果您正在流式传输具有结构的数据,我会更改读取的数据量以专门匹配您正在读取的数据类型。例如,如果您正在读取包含 4 字符代码、浮点数和字符串的二进制数据,我会将 4 字符代码和浮点数读入一个 4 字节数组。我会读取字符串的长度,然后创建一个缓冲区来一次读取整个字符串数据块。
If you are doing streaming data processing, I would look into the BinaryReader and BinaryWriter classes. These allow you to work with binary data very easily, without having to worry much about the data itself. It also allows you to decouple your buffer sized from the actual data you are working with. You could set a 16k buffer on the underlying stream, and read individual data values with the BinaryReader with ease.
如果您正在进行流式数据处理,我会研究 BinaryReader 和 BinaryWriter 类。这些使您可以非常轻松地处理二进制数据,而不必担心数据本身。它还允许您将缓冲区大小与您正在处理的实际数据分离。您可以在底层流上设置 16k 缓冲区,并使用 BinaryReader 轻松读取单个数据值。
回答by RRUZ
Depends on where you draw the line between access time and memory usage. The larger the buffer, the faster - but the more expensive in terms of memory. reading in multiplesof your File system cluster size is probably the most efficient, in a Windows XP system using NTFS, 4K is the default cluster size.
取决于您在访问时间和内存使用之间划清界限的位置。缓冲区越大,速度越快 - 但在内存方面更昂贵。读取文件系统簇大小的倍数可能是最有效的,在使用 NTFS 的 Windows XP 系统中,4K 是默认的簇大小。
You can see this link Default cluster size for NTFS, FAT, and exFAT
您可以看到此链接NTFS、FAT 和 exFAT 的默认簇大小
Bye.
再见。