在 Windows 上创建大文件
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/455297/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Creating big file on Windows
提问by Ilya
I need to create big relatively big (1-8 GB) files. What is the fastest way to do so on Windows using C or C++ ? I need to create them on the fly and the speed is really an issue. File will be used for storage emulation i.e will be access randomly in different offsets and i need that all storage will be preallocate but not initialized, currently we are writing all storage with dummy data and it's taking too long.
我需要创建较大的相对较大(1-8 GB)的文件。在 Windows 上使用 C 或 C++ 执行此操作的最快方法是什么?我需要即时创建它们,速度确实是一个问题。文件将用于存储模拟,即将在不同的偏移量中随机访问,我需要所有存储都将被预分配但未初始化,目前我们正在用虚拟数据写入所有存储,这花费了太长时间。
Thanks.
谢谢。
回答by Brian R. Bondy
Use the Win32 API, CreateFile, SetFilePointerEx, SetEndOfFile, and CloseHandle. In that same order.
使用 Win32 API、CreateFile、SetFilePointerEx、SetEndOfFile和CloseHandle。按照同样的顺序。
The trick is in the SetFilePointerEx function. From MSDN:
诀窍在于 SetFilePointerEx 函数。来自 MSDN:
Note that it is not an error to set the file pointer to a position beyond the end of the file. The size of the file does not increase until you call the SetEndOfFile, WriteFile, or WriteFileEx function.
请注意,将文件指针设置为超出文件末尾的位置并不是错误。在您调用 SetEndOfFile、WriteFile 或 WriteFileEx 函数之前,文件的大小不会增加。
Windows explorer actually does this same thing when copying a file from one location to another. It does this so that the disk does not need to re-allocate the file for a fragmented disk.
将文件从一个位置复制到另一个位置时,Windows 资源管理器实际上会做同样的事情。这样做是为了使磁盘不需要为碎片磁盘重新分配文件。
回答by Laserallan
Check out memory mapped files.
检查内存映射文件。
They very much match the use case you describe, high performance and random access.
它们与您描述的用例、高性能和随机访问非常匹配。
I believe they don't need to be created as large files. You just set a large max size on them and they will be expanded when you write to parts you haven't touched before.
我相信它们不需要创建为大文件。您只需在它们上设置一个较大的最大尺寸,当您写入以前未接触过的部分时,它们就会被扩展。
回答by opal
Use "fsutil" command:
使用“fsutil”命令:
E:\VirtualMachines>fsutil file createnew Usage : fsutil file createnew Eg : fsutil file createnew C:\testfile.txt 1000
E:\VirtualMachines>fsutil file createnew 用法:fsutil file createnew 例如:fsutil file createnew C:\testfile.txt 1000
Reagds
阅读器
P.S. it is for Windows: 2000/XP/7
PS 适用于 Windows:2000/XP/7
回答by ST3
Well thissolution is not bad, but the thing you are looking for is SetFileValidData
嗯,这个解决方案还不错,但你要找的是SetFileValidData
As MSDN sais:
正如 MSDN 所说:
The SetFileValidData function allows you to avoid filling data with zeros when writing nonsequentially to a file.
SetFileValidData 函数允许您在非顺序写入文件时避免用零填充数据。
So this always leave disk data as it is, SetFilePointerEx
should set all data to zeros, so big allocation takes some time.
所以这总是保持磁盘数据原样,SetFilePointerEx
应该将所有数据设置为零,因此大分配需要一些时间。
回答by Ben Key
I am aware that your question is tagged with Windows, and Brian R. Bondy gave you the best answer to your question if you know for certain you will not have to port your application to other platforms. However, if you might have to port your application to other platforms, you might want to do something more like what Adrian Cornish proposed as the answer for the question "How to create file of “x” size?" found at How to create file of "x" size?.
我知道您的问题带有 Windows 标签,如果您确定不必将应用程序移植到其他平台,那么 Brian R. Bondy 为您的问题提供了最佳答案。但是,如果您可能需要将您的应用程序移植到其他平台,您可能想要做一些更像 Adrian Cornish 提出的问题“如何创建“x”大小的文件?在如何创建“x”大小的文件?.
FILE *fp=fopen("myfile", "w");
fseek(fp, 1024*1024, SEEK_SET);
fputc('\n', fp);
fclose(fp);
Of course, there is an added twist. The answer proposed by Adrian Cornish makes use of the fseek function which has the following signature.
当然,还有一个额外的转折。Adrian Cornish 提出的答案使用了 fseek 函数,该函数具有以下签名。
int fseek ( FILE * stream, long int offset, int origin );
The problem is that you want to create a very large file with a file size that is beyond the range of a 32-bit integer. You need to use the 64-bit equivalent of fseek. Unfortunately, on different platforms it has different names.
问题是您想要创建一个文件大小超出 32 位整数范围的非常大的文件。您需要使用 fseek 的 64 位等效项。不幸的是,在不同的平台上它有不同的名称。
The header file LargeFileSupport.h found at http://mosaik-aligner.googlecode.com/svn-history/r2/trunk/src/CommonSource/Utilities/LargeFileSupport.hoffers a solution to this problem.
位于http://mosaik-aligner.googlecode.com/svn-history/r2/trunk/src/CommonSource/Utilities/LargeFileSupport.h的头文件 LargeFileSupport.h为这个问题提供了解决方案。
This would allow you to write the following function.
这将允许您编写以下函数。
#include "LargeFileSupport.h"
/* Include other headers. */
bool createLargeFile(const char * filename, off_type size)
{
FILE *fp = fopen(filename, "w");
if (!fp)
{
return false;
}
fseek64(fp, size, SEEK_SET);
fputc('\n', fp);
fclose(fp);
}
I thought I would add this just in case the information would be of use to you.
我想我会添加这个以防万一信息对你有用。
回答by Stu Mackellar
If you're using NTFS then sparse filesare the way to go:
如果您使用的是 NTFS,则可以使用稀疏文件:
A file in which much of the data is zeros is said to contain a sparse data set. Files like these are typically very large—for example, a file containing image data to be processed or a matrix within a high-speed database. The problem with files containing sparse data sets is that the majority of the file does not contain useful data and, because of this, they are an inefficient use of disk space.
The file compression in the NTFS file system is a partial solution to the problem. All data in the file that is not explicitly written is explicitly set to zero. File compression compacts these ranges of zeros. However, a drawback of file compression is that access time may increase due to data compression and decompression.
Support for sparse files is introduced in the NTFS file system as another way to make disk space usage more efficient. When sparse file functionality is enabled, the system does not allocate hard drive space to a file except in regions where it contains nonzero data. When a write operation is attempted where a large amount of the data in the buffer is zeros, the zeros are not written to the file. Instead, the file system creates an internal list containing the locations of the zeros in the file, and this list is consulted during all read operations. When a read operation is performed in areas of the file where zeros were located, the file system returns the appropriate number of zeros in the buffer allocated for the read operation. In this way, maintenance of the sparse file is transparent to all processes that access it, and is more efficient than compression for this particular scenario.
大部分数据为零的文件被称为包含稀疏数据集。此类文件通常非常大,例如,包含要处理的图像数据的文件或高速数据库中的矩阵。包含稀疏数据集的文件的问题是大部分文件不包含有用的数据,因此,它们对磁盘空间的使用效率低下。
NTFS文件系统中的文件压缩部分解决了这个问题。文件中未显式写入的所有数据都显式设置为零。文件压缩压缩这些零范围。但是,文件压缩的一个缺点是访问时间可能会因数据压缩和解压缩而增加。
NTFS 文件系统中引入了对稀疏文件的支持,作为提高磁盘空间使用效率的另一种方法。启用稀疏文件功能后,系统不会为文件分配硬盘空间,除非在文件包含非零数据的区域。当缓冲区中的大量数据为零时尝试写入操作时,零不会写入文件。相反,文件系统会创建一个包含文件中零位置的内部列表,并且在所有读取操作期间都会参考该列表。当在文件的零所在区域执行读取操作时,文件系统会在为读取操作分配的缓冲区中返回适当数量的零。通过这种方式,