在 Linux C++ 应用程序中查找和读取大文件
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1035657/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Seeking and reading large files in a Linux C++ application
提问by John Bellone
I am running into integer overflow using the standard ftelland fseekoptions inside of G++, but I guess I was mistaken because it seems that ftell64and fseek64are not available. I have been searching and many websites seem to reference using lseekwith the off64_tdatatype, but I have not found any examples referencing something equal to fseek. Right now the files that I am reading in are 16GB+ CSV files with the expectation of at least double that.
我使用G++ 中的标准ftell和fseek选项遇到整数溢出问题,但我想我错了,因为ftell64和fseek64似乎不可用。我一直在搜索,许多网站似乎使用带有off64_t数据类型的lseek进行引用,但我没有找到任何引用等于fseek 的示例。现在我正在阅读的文件是 16GB+ CSV 文件,预计至少是两倍。
Without any external libraries what is the most straightforward method for achieving a similar structure as with the fseek/ftellpair? My application right now works using the standard GCC/G++ libraries for 4.x.
在没有任何外部库的情况下,实现与fseek/ftell对类似的结构的最直接方法是什么?我的应用程序现在使用 4.x 的标准 GCC/G++ 库工作。
回答by nos
fseek64 is a C function. To make it available you'll have to define _FILE_OFFSET_BITS=64 before including the system headers That will more or less define fseek to be actually fseek64. Or do it in the compiler arguments e.g. gcc -D_FILE_OFFSET_BITS=64 ....
fseek64 是一个 C 函数。要使其可用,您必须在包含系统头文件之前定义 _FILE_OFFSET_BITS=64 这或多或少地将 fseek 定义为实际上是 fseek64。或者在编译器参数中执行,例如 gcc -D_FILE_OFFSET_BITS=64 ....
http://www.suse.de/~aj/linux_lfs.htmlhas a great overviw of large file support on linux:
http://www.suse.de/~aj/linux_lfs.html对 linux 上的大文件支持有一个很好的概述:
- Compile your programs with "gcc -D_FILE_OFFSET_BITS=64". This forces all file access calls to use the 64 bit variants. Several types change also, e.g. off_t becomes off64_t. It's therefore important to always use the correct types and to not use e.g. int instead of off_t. For portability with other platforms you should use getconf LFS_CFLAGS which will return -D_FILE_OFFSET_BITS=64 on Linux platforms but might return something else on e.g. Solaris. For linking, you should use the link flags that are reported via getconf LFS_LDFLAGS. On Linux systems, you do not need special link flags.
- Define _LARGEFILE_SOURCE and _LARGEFILE64_SOURCE. With these defines you can use the LFS functions like open64 directly.
- Use the O_LARGEFILE flag with open to operate on large files.
- 用“gcc -D_FILE_OFFSET_BITS=64”编译你的程序。这会强制所有文件访问调用使用 64 位变体。几种类型也会发生变化,例如 off_t 变为 off64_t。因此,始终使用正确的类型并且不要使用例如 int 而不是 off_t 很重要。为了与其他平台的可移植性,您应该使用 getconf LFS_CFLAGS,它会在 Linux 平台上返回 -D_FILE_OFFSET_BITS=64 但在例如 Solaris 上可能会返回其他内容。对于链接,您应该使用通过 getconf LFS_LDFLAGS 报告的链接标志。在 Linux 系统上,您不需要特殊的链接标志。
- 定义 _LARGEFILE_SOURCE 和 _LARGEFILE64_SOURCE。通过这些定义,您可以直接使用像 open64 这样的 LFS 函数。
- 使用 O_LARGEFILE 标志和 open 对大文件进行操作。
回答by mark4o
If you want to stick to ISO C standard interfaces, use fgetpos()
and fsetpos()
. However, these functions are only useful for saving a file position and going back to the same position later. They represent the position using the type fpos_t
, which is not required to be an integer data type. For example, on a record-based system it could be a struct containing a record number and offset within the record. This may be too limiting.
如果您想坚持使用 ISO C 标准接口,请使用fgetpos()
和fsetpos()
。但是,这些功能仅用于保存文件位置并稍后返回到相同位置。它们使用 type 表示位置fpos_t
,它不需要是整数数据类型。例如,在基于记录的系统上,它可能是一个包含记录编号和记录内偏移量的结构。这可能太局限了。
POSIX defines the functions ftello()
and fseeko()
, which represent the position using the off_t
type. This is required to be an integer type, and the value is a byte offset from the beginning of the file. You can perform arithmetic on it, and can use fseeko()
to perform relative seeks. This will work on Linux and other POSIX systems.
POSIX 定义了函数ftello()
和fseeko()
,它们使用off_t
类型表示位置。这需要是整数类型,并且值是从文件开头的字节偏移量。您可以对其执行算术运算,并且可以用于fseeko()
执行相对查找。这将适用于 Linux 和其他 POSIX 系统。
In addition, compile with -D_FILE_OFFSET_BITS=64
(Linux/Solaris). This will define off_t
to be a 64-bit type (i.e. off64_t
) instead of long
, and will redefine the functions that use file offsets to be the versions that take 64-bit offsets. This is the default when you are compiling for 64-bit, so is not needed in that case.
另外,用-D_FILE_OFFSET_BITS=64
(Linux/Solaris)编译。这将定义off_t
为 64 位类型(即off64_t
)而不是long
,并将使用文件偏移量的函数重新定义为采用 64 位偏移量的版本。这是为 64 位编译时的默认设置,因此在这种情况下不需要。
回答by Void
Have you tried fseeko()with the _FILE_OFFSET_BITSpreprocessor symbol set to 64?
您是否尝试过将_FILE_OFFSET_BITS预处理器符号设置为64 的fseeko()?
This will give you an fseek()-like interface but with an offset parameter of type off_tinstead of long. Setting _FILE_OFFSET_BITS=64will make off_ta 64-bit type.
这将为您提供一个类似 fseek()的接口,但具有类型为off_t而不是long的偏移参数。设置_FILE_OFFSET_BITS=64将使off_t成为 64 位类型。
The same for goes for ftello().
同为无二ftello() 。
回答by Luca Matteis
fseek64()
isn't standard, the compiler docs should tell you where to find it.
fseek64()
不是标准的,编译器文档应该告诉你在哪里可以找到它。
Have you tried fgetpos
and fsetpos
? They're designed for large files and the implementation typically uses a 64-bit type as the base for fpos_t.
您是否尝试过fgetpos
与fsetpos
?它们是为大文件设计的,实现通常使用 64 位类型作为 fpos_t 的基础。
回答by Adam Rosenfield
Use fsetpos(3)
and fgetpos(3)
. They use the fpos_t
datatype , which I believe is guaranteed to be able to hold at least 64 bits.
使用fsetpos(3)
和fgetpos(3)
。他们使用fpos_t
datatype ,我相信它保证能够容纳至少 64 位。