在Linux系统上快速创建大文件

Question

提问by DrStalker

How can I quicklycreate a large file on a Linux (Red Hat Linux) system?

如何在 Linux ( Red Hat Linux) 系统上快速创建大文件？

ddwill do the job, but reading from /dev/zeroand writing to the drive can take a long time when you need a file several hundreds of GBs in size for testing... If you need to do that repeatedly, the time really adds up.

dd可以完成这项工作，但是/dev/zero当您需要数百 GB 大小的文件进行测试时，从驱动器读取和写入驱动器可能需要很长时间...

I don't care about the contents of the file, I just want it to be created quickly. How can this be done?

我不关心文件的内容，我只想快速创建它。如何才能做到这一点？

Using a sparse file won't work for this. I need the file to be allocated disk space.

使用稀疏文件对此不起作用。我需要为文件分配磁盘空间。

Answer 1

回答by CMS

Linux & all filesystems

Linux 和所有文件系统

xfs_mkfile 10240m 10Gigfile

Linux & and some filesystems (ext4, xfs, btrfs and ocfs2)

Linux 和一些文件系统（ext4、xfs、btrfs 和 ocfs2）

fallocate -l 10G 10Gigfile

OS X, Solaris, SunOS and probably other UNIXes

OS X、Solaris、SunOS 和可能的其他 UNIX

mkfile 10240m 10Gigfile

HP-UX

用户体验

prealloc 10Gigfile 10737418240

Explanation

解释

Try mkfile <size>myfile as an alternative of dd. With the -noption the size is noted, but disk blocks aren't allocated until data is written to them. Without the -noption, the space is zero-filled, which means writing to the disk, which means taking time.

尝试使用mkfile <size>myfile 作为dd. 使用该-n选项会记录大小，但在将数据写入磁盘块之前不会分配磁盘块。如果没有这个-n选项，空间是零填充的，这意味着写入磁盘，这意味着需要时间。

mkfileis derived from SunOS and is not available everywhere. Most Linux systems have xfs_mkfilewhich works exactly the same way, and not just on XFS file systems despite the name. It's included in xfsprogs(for Debian/Ubuntu) or similar named packages.

mkfile源自 SunOS，并非随处可用。大多数 Linux 系统的xfs_mkfile工作方式完全相同，而不仅仅是在 XFS 文件系统上，尽管名称不同。它包含在xfsprogs（用于 Debian/Ubuntu）或类似命名的软件包中。

Most Linux systems also have fallocate, which only works on certain file systems (such as btrfs, ext4, ocfs2, and xfs), but is the fastest, as it allocates all the file space (creates non-holey files) but does not initialize any of it.

大多数 Linux 系统也有fallocate，它仅适用于某些文件系统（例如 btrfs、ext4、ocfs2 和 xfs），但速度最快，因为它分配所有文件空间（创建非多孔文件）但不初始化任何其中。

Answer 2

回答by paxdiablo

One approach: if you can guarantee unrelated applications won't use the files in a conflicting manner, just create a pool of files of varying sizes in a specific directory, then create links to them when needed.

一种方法：如果您可以保证不相关的应用程序不会以冲突的方式使用这些文件，只需在特定目录中创建一个不同大小的文件池，然后在需要时创建指向它们的链接。

For example, have a pool of files called:

例如，有一个名为的文件池：

/home/bigfiles/512M-A
/home/bigfiles/512M-B
/home/bigfiles/1024M-A
/home/bigfiles/1024M-B

/home/bigfiles/512M-A
/home/bigfiles/512M-B
/home/bigfiles/1024M-A
/home/bigfiles/1024M-B

Then, if you have an application that needs a 1G file called /home/oracle/logfile, execute a "ln /home/bigfiles/1024M-A /home/oracle/logfile".

然后，如果您的应用程序需要一个名为 /home/oracle/logfile 的 1G 文件，请执行“ ln /home/bigfiles/1024M-A /home/oracle/logfile”。

If it's on a separate filesystem, you will have to use a symbolic link.

如果它在单独的文件系统上，则必须使用符号链接。

The A/B/etc files can be used to ensure there's no conflicting use between unrelated applications.

A/B/etc 文件可用于确保不相关的应用程序之间没有冲突使用。

The link operation is about as fast as you can get.

链接操作与您可以获得的一样快。

Answer 3

回答by Barry Brown

I don't think you're going to get much faster than dd. The bottleneck is the disk; writing hundreds of GB of data to it is going to take a long time no matter how you do it.

我不认为你会比 dd 快得多。瓶颈是磁盘；无论您如何操作，将数百 GB 的数据写入其中都将花费很长时间。

But here's a possibility that might work for your application. If you don't care about the contents of the file, how about creating a "virtual" file whose contents are the dynamic output of a program? Instead of open()ing the file, use popen() to open a pipe to an external program. The external program generates data whenever it's needed. Once the pipe is open, it acts just like a regular file in that the program that opened the pipe can fseek(), rewind(), etc. You'll need to use pclose() instead of close() when you're done with the pipe.

但这里有一种可能适用于您的应用程序。如果您不关心文件的内容，那么创建一个内容为程序动态输出的“虚拟”文件如何？使用 popen() 打开通向外部程序的管道，而不是 open() 文件。外部程序在需要时生成数据。一旦管道打开，它就像一个普通文件，因为打开管道的程序可以 fseek()、rewind() 等。当你需要使用 pclose() 而不是 close() 时用管道完成。

If your application needs the file to be a certain size, it will be up to the external program to keep track of where in the "file" it is and send an eof when the "end" has been reached.

如果您的应用程序需要文件具有特定大小，则由外部程序来跟踪它在“文件”中的位置并在到达“结束”时发送 eof。

Answer 4

回答by Zoredache

Where seek is the size of the file you want in bytes - 1.

其中，seek 是您想要的文件大小（以字节为单位） - 1。

dd if=/dev/zero of=filename bs=1 count=1 seek=1048575

Answer 5

回答by kiv

truncate -s 10M output.file

will create a 10 M file instantaneously (M stands for 1024*1024 bytes, MB stands for 1000*1000 - same with K, KB, G, GB...)

将立即创建一个 10 M 的文件（M 代表 1024*1024 字节，MB 代表 1000*1000 - 与 K、KB、G、GB 相同...）

EDIT:as many have pointed out, this will not physically allocate the file on your device. With this you could actually create an arbitrary large file, regardless of the available space on the device, as it creates a "sparse" file.

编辑：正如许多人指出的那样，这不会在您的设备上物理分配文件。有了这个，您实际上可以创建任意大文件，而不管设备上的可用空间如何，因为它会创建一个“稀疏”文件。

So, when doing this, you will be deferring physical allocation until the file is accessed. If you're mapping this file to memory, you may not have the expected performance.

因此，在执行此操作时，您将推迟物理分配，直到访问文件。如果您将此文件映射到内存，则可能无法获得预期的性能。

But this is still a useful command to know

但这仍然是一个有用的命令要知道

Answer 6

回答by Franta

ddfrom the other answers is a good solution, but it is slow for this purpose. In Linux (and other POSIX systems), we have fallocate, which uses the desired space without having to actually writing to it, works with most modern disk based file systems, very fast:

dd来自其他答案是一个很好的解决方案，但为此目的很慢。在 Linux（和其他 POSIX 系统）中，我们有fallocate，它使用所需的空间而无需实际写入，适用于大多数现代基于磁盘的文件系统，非常快：

For example:

例如：

fallocate -l 10G gentoo_root.img

Answer 7

回答by Alex Dupuy

The GPL mkfile is just a (ba)sh script wrapper around dd; BSD's mkfile just memsets a buffer with non-zero and writes it repeatedly. I would not expect the former to out-perform dd. The latter might edge out dd if=/dev/zero slightly since it omits the reads, but anything that does significantly better is probably just creating a sparse file.

GPL mkfile 只是一个围绕 dd 的 (ba)sh 脚本包装器；BSD 的 mkfile 只是 memset 一个非零的缓冲区并重复写入它。我不希望前者的表现优于 dd。后者可能会略微超出 dd if=/dev/zero ，因为它省略了读取，但任何明显更好的可能只是创建一个稀疏文件。

Absent a system call that actually allocates space for a file without writing data (and Linux and BSD lack this, probably Solaris as well) you might get a small improvement in performance by using ftrunc(2)/truncate(1) to extend the file to the desired size, mmap the file into memory, then write non-zero data to the first bytes of every disk block (use fgetconf to find the disk block size).

如果没有实际为文件分配空间而不写入数据的系统调用（Linux 和 BSD 缺少此功能，可能 Solaris 也是如此），您可能会通过使用 ftrunc(2)/truncate(1) 扩展文件来获得性能的小幅提升到所需的大小，将文件 mmap 到内存中，然后将非零数据写入每个磁盘块的第一个字节（使用 fgetconf 查找磁盘块大小）。

Answer 8

回答by Sepero

Examples where seek is the size of the file you want in bytes

seek 是您想要的文件大小（以字节为单位）的示例

#kilobytes
dd if=/dev/zero of=filename bs=1 count=0 seek=200K

#megabytes
dd if=/dev/zero of=filename bs=1 count=0 seek=200M

#gigabytes
dd if=/dev/zero of=filename bs=1 count=0 seek=200G

#terabytes
dd if=/dev/zero of=filename bs=1 count=0 seek=200T

From the dd manpage:

从 dd 联机帮助页：

BLOCKS and BYTES may be followed by the following multiplicative suffixes: c=1, w=2, b=512, kB=1000, K=1024, MB=1000*1000, M=1024*1024, GB =1000*1000*1000, G=1024*1024*1024, and so on for T, P, E, Z, Y.

BLOCKS 和 BYTES 后面可能有以下乘法后缀：c=1, w=2, b=512, kB=1000, K=1024, MB=1000*1000, M=1024*1024, GB =1000*1000* 1000，G=1024*1024*1024，依此类推，T、P、E、Z、Y。

Answer 9

回答by Humungous Hippo

I don't know a whole lot about Linux, but here's the C Code I wrote to fake huge files on DC Share many years ago.

我对 Linux 不太了解，但这是我多年前在 DC Share 上编写的用于伪造大文件的 C 代码。

#include < stdio.h >
#include < stdlib.h >

int main() {
    int i;
    FILE *fp;

    fp=fopen("bigfakefile.txt","w");

    for(i=0;i<(1024*1024);i++) {
        fseek(fp,(1024*1024),SEEK_CUR);
        fprintf(fp,"C");
    }
}

Answer 10

回答by Dan McAllister

This is a common question -- especially in today's environment of virtual environments. Unfortunately, the answer is not as straight-forward as one might assume.

这是一个常见问题——尤其是在当今的虚拟环境中。不幸的是，答案并不像人们想象的那么直接。

dd is the obvious first choice, but dd is essentially a copy and that forces you to write every block of data (thus, initializing the file contents)... And that initialization is what takes up so much I/O time. (Want to make it take even longer? Use /dev/randominstead of /dev/zero! Then you'll use CPU as well as I/O time!) In the end though, dd is a poor choice (though essentially the default used by the VM "create" GUIs). E.g:

dd 是显而易见的第一选择，但 dd 本质上是一个副本，它迫使您写入每个数据块（因此，初始化文件内容）......而初始化占用了如此多的 I/O 时间。（想让它花费更长的时间？使用/dev/random而不是/dev/zero！然后你将使用 CPU 以及 I/O 时间！）但最后， dd 是一个糟糕的选择（尽管本质上是VM“创建”GUI 使用的默认值）。例如：

dd if=/dev/zero of=./gentoo_root.img bs=4k iflag=fullblock,count_bytes count=10G

truncateis another choice -- and is likely the fastest... But that is because it creates a "sparse file". Essentially, a sparse file is a section of disk that has a lot of the same data, and the underlying filesystem "cheats" by not really storing all of the data, but just "pretending" that it's all there. Thus, when you use truncate to create a 20 GB drive for your VM, the filesystem doesn't actually allocate 20 GB, but it cheats and says that there are 20 GB of zeros there, even though as little as one track on the disk may actually (really) be in use. E.g.:

truncate是另一种选择——并且可能是最快的......但那是因为它创建了一个“稀疏文件”。从本质上讲，稀疏文件是磁盘的一部分，其中包含许多相同的数据，而底层文件系统不会真正存储所有数据，而只是“假装”所有数据都在那里“欺骗”。因此，当您使用 truncate 为您的 VM 创建一个 20 GB 的驱动器时，文件系统实际上并没有分配 20 GB，但它欺骗并说那里有 20 GB 的零，即使磁盘上只有一个磁道可能实际上（真的）正在使用中。例如：

 truncate -s 10G gentoo_root.img

fallocate is thefinal -- and best-- choicefor use with VM disk allocation, because it essentially "reserves" (or "allocates" all of the space you're seeking, but it doesn't bother to write anything. So, when you use fallocate to create a 20 GB virtual drive space, you really do get a 20 GB file (not a "sparse file", and you won't have bothered to write anything to it -- which means virtually anything could be in there -- kind of like a brand new disk!) E.g.:

fallocate 是与 VM 磁盘分配一起使用的最后——也是最好的——选择，因为它本质上是“保留”（或“分配”你正在寻找的所有空间，但它不会写任何东西。所以，当您使用 fallocate 创建一个 20 GB 的虚拟驱动器空间时，您确实会得到一个 20 GB 的文件（不是“稀疏文件”，而且您不必费心向其中写入任何内容——这意味着几乎任何东西都可以在那里——有点像一个全新的磁盘！）例如：

fallocate -l 10G gentoo_root.img

在Linux系统上快速创建大文件

提问by DrStalker

回答by CMS

Linux & all filesystems

Linux 和所有文件系统

Linux & and some filesystems (ext4, xfs, btrfs and ocfs2)

Linux 和一些文件系统（ext4、xfs、btrfs 和 ocfs2）

OS X, Solaris, SunOS and probably other UNIXes

OS X、Solaris、SunOS 和可能的其他 UNIX

HP-UX

用户体验

Explanation

解释

回答by paxdiablo

回答by Barry Brown

回答by Zoredache

回答by kiv

回答by Franta

回答by Alex Dupuy

回答by Sepero

回答by Humungous Hippo

回答by Dan McAllister

相关推荐

最近更新

标签

在Linux系统上快速创建大文件

提问by DrStalker

回答by CMS

Linux & all filesystems

Linux 和所有文件系统

Linux & and some filesystems (ext4, xfs, btrfs and ocfs2)

Linux 和一些文件系统（ext4、xfs、btrfs 和 ocfs2）

OS X, Solaris, SunOS and probably other UNIXes

OS X、Solaris、SunOS 和可能的其他 UNIX

HP-UX

用户体验

Explanation

解释

回答by paxdiablo

回答by Barry Brown

回答by Zoredache

回答by kiv

回答by Franta

回答by Alex Dupuy

回答by Sepero

回答by Humungous Hippo

回答by Dan McAllister

相关推荐

Linux 删除指向目录的符号链接

C# 在 System.Data.SQLite 中创建/使用用户定义的函数？

Linux 获取机器的IP地址

Linux 命令（如 cat）读取指定数量的字符

相关推荐

最近更新

标签