C语言 C中最快的文件读取

Question

提问by Jay

Right now I am using fread() to read a file, but in other language fread() is inefficient i'v been told. Is this the same in C? If so, how would faster file reading be done?

现在我正在使用 fread() 读取文件，但有人告诉我，在其他语言中 fread() 效率低下。这在C中是一样的吗？如果是这样，如何更快地读取文件？

Answer 1

采纳答案by R Samuel Klatchko

If you are willing to go beyond the C spec into OS specific code, memory mapping is generally considered the most efficient way.

如果您愿意超越 C 规范进入特定于操作系统的代码，内存映射通常被认为是最有效的方式。

For Posix, check out mmapand for Windows check out OpenFileMapping

对于 Posix，请查看mmap，对于 Windows，请查看OpenFileMapping

Answer 2

回答by Thanatos

It really shouldn't matter.

真的应该没有关系。

If you're reading from an actual hard disk, it's going to be slow. The hard disk is your bottle neck, and that's it.

如果您从实际硬盘读取数据，速度会很慢。硬盘是你的瓶颈，仅此而已。

Now, if you're being silly about your call to read/fread/whatever, and say, fread()-ing a byte at a time, then yes, it's going to be slow, as the overhead of fread() will outstrip the overhead of reading from the disk.

现在，如果您对 read/fread/whatever 的调用感到愚蠢，并且说 fread() 一次一个字节，那么是的，它会很慢，因为 fread() 的开销将超过从磁盘读取的开销。

If you call read/fread/whatever and request a decent portion of data. This will depend on what you're doing: sometimes all want/need is 4 bytes (to get a uint32), but sometimes you can read in large chunks (4 KiB, 64 KiB, etc. RAM is cheap, go for something significant.)

如果您调用 read/fread/whatever 并请求相当一部分数据。这将取决于您在做什么：有时所有想要/需要的是 4 个字节（以获得 uint32），但有时您可以读取大块（4 KiB、64 KiB 等。RAM 很便宜，去寻找一些重要的东西.)

If you're doing small reads, some of the higher level calls like fread() will actual help you by buffering data behind your back. If you're doing large reads, it might not be helpful, but switching from fread to read will probably not yield that much improvement, as you're bottlenecked on disk speed.

如果您正在执行少量读取，一些更高级别的调用（如 fread()）将通过在背后缓冲数据来实际帮助您。如果您进行大量读取，它可能没有帮助，但从 fread 切换到 read 可能不会产生太大的改进，因为您在磁盘速度上遇到瓶颈。

In short: if you can, request a liberal amount when reading, and try to minimize what you write. For large amounts, powers of 2 tend to be friendlier than anything else, but of course, it's OS, hardware, and weather dependent.

简而言之：如果可以，请在阅读时要求大量的内容，并尽量减少所写的内容。对于大量的情况，2 的幂往往比其他任何东西都更友好，但当然，它取决于操作系统、硬件和天气。

So, let's see if this might bring out any differences:

那么，让我们看看这是否会带来任何差异：

#include <sys/time.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/mman.h>
#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>

#define BUFFER_SIZE (1 * 1024 * 1024)
#define ITERATIONS (10 * 1024)

double now()
{
    struct timeval tv;
    gettimeofday(&tv, NULL);
    return tv.tv_sec + tv.tv_usec / 1000000.;
}

int main()
{
    unsigned char buffer[BUFFER_SIZE]; // 1 MiB buffer

    double end_time;
    double total_time;
    int i, x, y;
    double start_time = now();

#ifdef USE_FREAD
    FILE *fp;
    fp = fopen("/dev/zero", "rb");
    for(i = 0; i < ITERATIONS; ++i)
    {
        fread(buffer, BUFFER_SIZE, 1, fp);
        for(x = 0; x < BUFFER_SIZE; x += 1024)
        {
            y += buffer[x];
        }
    }
    fclose(fp);
#elif USE_MMAP
    unsigned char *mmdata;
    int fd = open("/dev/zero", O_RDONLY);
    for(i = 0; i < ITERATIONS; ++i)
    {
        mmdata = mmap(NULL, BUFFER_SIZE, PROT_READ, MAP_PRIVATE, fd, i * BUFFER_SIZE);
        // But if we don't touch it, it won't be read...
        // I happen to know I have 4 KiB pages, YMMV
        for(x = 0; x < BUFFER_SIZE; x += 1024)
        {
            y += mmdata[x];
        }
        munmap(mmdata, BUFFER_SIZE);
    }
    close(fd);
#else
    int fd;
    fd = open("/dev/zero", O_RDONLY);
    for(i = 0; i < ITERATIONS; ++i)
    {
        read(fd, buffer, BUFFER_SIZE);
        for(x = 0; x < BUFFER_SIZE; x += 1024)
        {
            y += buffer[x];
        }
    }
    close(fd);

#endif

    end_time = now();
    total_time = end_time - start_time;

    printf("It took %f seconds to read 10 GiB. That's %f MiB/s.\n", total_time, ITERATIONS / total_time);

    return 0;
}

...yields:

...产量：

$ gcc -o reading reading.c
$ ./reading ; ./reading ; ./reading 
It took 1.141995 seconds to read 10 GiB. That's 8966.764671 MiB/s.
It took 1.131412 seconds to read 10 GiB. That's 9050.637376 MiB/s.
It took 1.132440 seconds to read 10 GiB. That's 9042.420953 MiB/s.
$ gcc -o reading reading.c -DUSE_FREAD
$ ./reading ; ./reading ; ./reading 
It took 1.134837 seconds to read 10 GiB. That's 9023.322991 MiB/s.
It took 1.128971 seconds to read 10 GiB. That's 9070.207522 MiB/s.
It took 1.136845 seconds to read 10 GiB. That's 9007.383586 MiB/s.
$ gcc -o reading reading.c -DUSE_MMAP
$ ./reading ; ./reading ; ./reading 
It took 2.037207 seconds to read 10 GiB. That's 5026.489386 MiB/s.
It took 2.037060 seconds to read 10 GiB. That's 5026.852369 MiB/s.
It took 2.031698 seconds to read 10 GiB. That's 5040.119180 MiB/s.

...or no noticeable difference. (fread is winning sometimes, sometimes read)

...或没有明显差异。（fread 有时赢，有时阅读）

Note: The slow mmapis surprising. This might be due to me asking it to allocate the buffer for me. (I wasn't sure about requirements of supplying a pointer...)

注意：缓慢mmap令人惊讶。这可能是因为我要求它为我分配缓冲区。（我不确定提供指针的要求......）

In really short: Don't prematurely optimize. Make it run, make it right, make it fast, that order.

简而言之：不要过早地优化。让它运行，让它正确，让它快速，那个顺序。

Back by popular demand, I ran the test on a real file. (The first 675 MiB of the Ubuntu 10.04 32-bit desktop installation CD ISO) These were the results:

应大众需求，我对真实文件进行了测试。（Ubuntu 10.04 32 位桌面安装 CD ISO 的前 675 MiB）结果如下：

# Using fread()
It took 31.363983 seconds to read 675 MiB. That's 21.521501 MiB/s.
It took 31.486195 seconds to read 675 MiB. That's 21.437967 MiB/s.
It took 31.509051 seconds to read 675 MiB. That's 21.422416 MiB/s.
It took 31.853389 seconds to read 675 MiB. That's 21.190838 MiB/s.
# Using read()
It took 33.052984 seconds to read 675 MiB. That's 20.421757 MiB/s.
It took 31.319416 seconds to read 675 MiB. That's 21.552126 MiB/s.
It took 39.453453 seconds to read 675 MiB. That's 17.108769 MiB/s.
It took 32.619912 seconds to read 675 MiB. That's 20.692882 MiB/s.
# Using mmap()
It took 31.897643 seconds to read 675 MiB. That's 21.161438 MiB/s.
It took 36.753138 seconds to read 675 MiB. That's 18.365779 MiB/s.
It took 36.175385 seconds to read 675 MiB. That's 18.659097 MiB/s.
It took 31.841998 seconds to read 675 MiB. That's 21.198419 MiB/s.

...and one verybored programmer later, we've read the CD ISO off disk. 12 times. Before each test, the disk cache was cleared, and during each test there was enough, and approximately the same amout of, RAM free to hold the CD ISO twice in RAM.

……后来一位非常无聊的程序员，我们从磁盘上读取了 CD ISO。12次。在每次测试之前，磁盘缓存被清除，并且在每次测试期间，有足够且大致相同数量的可用 RAM 在 RAM 中保存 CD ISO 两次。

One note of interest: I was originally using a large malloc() to fill memory and thus minimize the effects of disk caching. It may be worth noting that mmapperformed terribly here. The other two solutions merely ran, mmapran and, for reasons I can't explain, began pushing memory to swap, which killed its performance. (The program was not leaking, as far as I know (the source code is above) - the actual "used memory" stayed constant throughout the trials.)

一个有趣的注意事项：我最初使用大型 malloc() 来填充内存，从而最大限度地减少磁盘缓存的影响。可能值得注意的是，mmap这里的表现非常糟糕。其他两个解决方案只是运行，mmap运行，并且出于我无法解释的原因，开始推动内存交换，这会降低其性能。（据我所知，程序没有泄漏（源代码在上面）——实际的“已用内存”在整个试验过程中保持不变。）

read() posted the fastest time overall, fread() posted really consistent times. This may have been to some small hiccup during the testing, however. All told, the three methods were just about equal. (Especially freadand read...)

read() 发布的总体时间最快，fread() 发布的时间非常一致。然而，这可能是测试过程中的一些小问题。总而言之，这三种方法差不多。（尤其是fread与read...）

Answer 3

回答by Matt Curtis

What's slowing you down?

什么让你慢下来？

If you need the fastest possible file reading (while still playing nicely with the operating system), go straight to your OS's calls, and make sure you study how to use them most effectively.

如果您需要最快的文件读取速度（同时仍能很好地与操作系统配合使用），请直接访问您的操作系统调用，并确保您研究如何最有效地使用它们。

How is your data physically laid out?For example, rotating drives might read data stored at the edges faster, and you want to minimize or eliminate seek times.
Is your data pre-processed?Do you need to do stuff between loading it from disk and using it?
What is the optimum chunk size for reading?(It might be some even multiple of the sector size. Check your OS documentation.)

您的数据物理布局如何？例如，旋转驱动器可能会更快地读取存储在边缘的数据，并且您希望最小化或消除寻道时间。
您的数据是否经过预处理？你需要在从磁盘加载它和使用它之间做些什么吗？
读取的最佳块大小是多少？（它可能是扇区大小的偶数倍。检查您的操作系统文档。）

If seek times are a problem, re-arrange your data on disk (if you can) and store it in larger, pre-processed files instead of loading small chunks from here and there.

如果查找时间有问题，请重新排列磁盘上的数据（如果可以）并将其存储在更大的预处理文件中，而不是从这里和那里加载小块。

If data transfer times are a problem, perhaps consider compressing the data.

如果数据传输时间有问题，也许可以考虑压缩数据。

Answer 4

回答by mcabral

I'm thinking of the readsystem call.

我在考虑read系统调用。

Keep in mind that fread is a wrapper for 'read'.

请记住， fread 是 'read' 的包装器。

On the other hand fread has an internal buffer, so 'read' may be faster but i think 'fread' will be more efficient.

另一方面，fread 有一个内部缓冲区，因此“读取”可能会更快，但我认为“fread”会更有效。

Answer 5

回答by MSN

If freadis slow it is because of the additional layers it adds to the underlying operating system mechanism to read from a file that interfere with how your particular program is using fread. In other words, it's slow because you aren't using it the way it has been optimized for.

如果fread它很慢，那是因为它向底层操作系统机制添加了额外的层，以从干扰您的特定程序使用方式的文件中读取fread。换句话说，它很慢，因为您没有按照优化的方式使用它。

Having said that, faster file reading would be done by understanding how the operating system I/O functions work and providing your own abstraction that handles your program's particular I/O access patterns better. Most of the time you can do this with memory mapping the file.

话虽如此，通过了解操作系统 I/O 功能的工作方式并提供您自己的抽象来更好地处理程序的特定 I/O 访问模式，可以实现更快的文件读取。大多数情况下，您可以通过内存映射文件来做到这一点。

However, if you are hitting the limits of the machine you are running on, memory mapping probably won't be sufficient. At that point it's really up to you to figure out how to optimize your I/O code.

但是，如果您正在运行的机器达到极限，则内存映射可能是不够的。到那时，真正由您决定如何优化您的 I/O 代码。

Answer 6

回答by yaneurabeya

The problem that some people have noted here, is that depending on your source, your target buffer size, etc, you can create a custom handler for that specific case, but there are other cases, like block/character devices, i.e. /dev/* where standard rules like that do or don't apply and your backing source might be something that pops character off serially without any buffering, like an I2C bus, standard RS-232, etc. And there are some other sources where character devices are memory mappable large sections of memory like nvidia does with their video driver character device (/dev/nvidiactl).

有些人在这里指出的问题是，根据您的源、目标缓冲区大小等，您可以为该特定情况创建自定义处理程序，但还有其他情况，例如块/字符设备，即 /dev/ * 像这样的标准规则适用或不适用，并且您的支持源可能是在没有任何缓冲的情况下连续弹出字符的东西，如 I2C 总线、标准 RS-232 等。还有一些其他来源，其中字符设备是内存可映射的大块内存，如 nvidia 使用其视频驱动程序字符设备 (/dev/nvidiactl) 所做的那样。

One other design implementation that many people have chosen in high-performance applications is asynchronous instead of synchronous I/O for handling how data is read. Look into libaio, and the ported versions of libaio which provide prepackaged solutions for asynchronous I/O, as well as look into using read with shared memory between a worker and consumer thread (but keep in mind that this will increase programming complexity if you go this route). Asynchronous I/O is also something that you can't get out of the box with stdio that you can get with standard OS system calls. Just be careful as there are bits of read which are `portable' according to the spec, but not all operating systems (like FreeBSD for instance) support POSIX STREAMs (by choice).

许多人在高性能应用程序中选择的另一种设计实现是异步而不是同步 I/O，用于处理读取数据的方式。查看 libaio，以及为异步 I/O 提供预打包解决方案的 libaio 的移植版本，以及研究在工作线程和消费者线程之间使用共享内存的 read（但请记住，如果你去，这将增加编程复杂性这条路线）。异步 I/O 也是标准 OS 系统调用无法使用的 stdio 开箱即用的东西。请注意，根据规范，有些读取是“可移植的”，但并非所有操作系统（例如 FreeBSD）都支持 POSIX STREAM（选择）。

Another thing that you can do (depending on how portable your data is) is look into compression and/or conversion into a binary format like database formats, i.e. BDB, SQL, etc. Some database formats are portable across machines using endianness conversion functions.

您可以做的另一件事（取决于您的数据的可移植性）是查看压缩和/或转换为二进制格式，如数据库格式，即 BDB、SQL 等。某些数据库格式可以使用字节序转换功能跨机器移植。

In general it would be best to take a set of algorithms and methods, run performance tests using the different methods, and evaluate the best algorithm that serves the mean task that your application would serve. That would help you determine what the best performing algorithm is.

一般来说，最好采用一组算法和方法，使用不同的方法运行性能测试，并评估服务于您的应用程序将服务的平均任务的最佳算法。这将帮助您确定性能最佳的算法是什么。

Answer 7

回答by intuited

Maybe check out how perl does it. Perl's I/O routines are optimized, and are, I gather, the reason why processing text with a perl filter can be twice as fast as doing the same transformation with sed.

也许看看 perl 是怎么做的。Perl 的 I/O 例程经过优化，我认为这就是为什么使用 perl 过滤器处理文本的速度是使用sed.

Obviously perl is pretty complex, and I/O is only one small part of what it does. I've never looked at its source so I couldn't give you any better directions than to point you here.

显然 perl 非常复杂，I/O 只是它所做的一小部分。我从来没有看过它的来源，所以我不能给你任何更好的方向，而不是把你指向这里。

C语言 C中最快的文件读取

提问by Jay

采纳答案by R Samuel Klatchko

回答by Thanatos

回答by Matt Curtis

回答by mcabral

回答by MSN

回答by yaneurabeya

回答by intuited

相关推荐

最近更新

标签

C语言 C中最快的文件读取

提问by Jay

采纳答案by R Samuel Klatchko

回答by Thanatos

回答by Matt Curtis

回答by mcabral

回答by MSN

回答by yaneurabeya

回答by intuited

相关推荐

C语言 更快的 memcpy 替代品？

C语言 使用 Autoconf 为 ARM 交叉编译

C语言 将图像原地旋转 90 度的算法？（没有额外的内存）

C语言 在 switch 语句中使用枚举类型

相关推荐

最近更新

标签

C语言更快的 memcpy 替代品？

C语言使用 Autoconf 为 ARM 交叉编译

C语言将图像原地旋转 90 度的算法？（没有额外的内存）

C语言在 switch 语句中使用枚举类型