C++ 比较两个文件
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/6163611/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Compare two files
提问by Chris
I'm trying to write a function which compares the content of two files.
我正在尝试编写一个函数来比较两个文件的内容。
I want it to return 1 if files are the same, and 0 if different.
如果文件相同,我希望它返回 1,如果不同则返回 0。
ch1
and ch2
works as a buffer, and I used fgets
to get the content of my files.
ch1
并ch2
用作缓冲区,我曾经fgets
获取文件的内容。
I think there is something wrong with the eof
pointer, but I'm not sure. FILE
variables are given within the command line.
我认为eof
指针有问题,但我不确定。FILE
变量在命令行中给出。
P.S. It works with small files with size under 64KB, but doesn't work with larger files (700MB movies for example, or 5MB of .mp3 files).
PS 它适用于小于 64KB 的小文件,但不适用于较大的文件(例如 700MB 的电影,或 5MB 的 .mp3 文件)。
Any ideas, how to work it out?
任何想法,如何解决?
int compareFile(FILE* file_compared, FILE* file_checked)
{
bool diff = 0;
int N = 65536;
char* b1 = (char*) calloc (1, N+1);
char* b2 = (char*) calloc (1, N+1);
size_t s1, s2;
do {
s1 = fread(b1, 1, N, file_compared);
s2 = fread(b2, 1, N, file_checked);
if (s1 != s2 || memcmp(b1, b2, s1)) {
diff = 1;
break;
}
} while (!feof(file_compared) || !feof(file_checked));
free(b1);
free(b2);
if (diff) return 0;
else return 1;
}
EDIT: I've improved this function with the inclusion of your answers. But it's only comparing first buffer only -> but with an exception -> I figured out that it stops reading the file until it reaches 1A character (attached file). How can we make it work?
编辑:我已经通过包含您的答案改进了此功能。但它只是比较第一个缓冲区 -> 但有一个例外 -> 我发现它停止读取文件,直到它达到 1A 个字符(附加文件)。我们怎样才能让它发挥作用?
EDIT2: Task solved (working code attached). Thanks to everyone for the help!
EDIT2:任务已解决(附上工作代码)。感谢大家的帮助!
采纳答案by Jason
Since you've allocated your arrays on the stack, they are filled with random values ... they aren't zeroed out.
由于您已经在堆栈上分配了数组,因此它们充满了随机值……它们没有被清零。
Secondly, strcmp
will only compare to the first NULL value, which, if it's a binary file, won't necessarily be at the end of the file. Therefore you should really be using memcmp
on your buffers. But again, this will give unpredictable results because of the fact that your buffers were allocated on the stack, so even if you compare to files that are the same, the end of the buffers past the EOF may not be the same, so memcmp
will still report false results (i.e., it will most likely report that the files are not the same when they are because of the random values at the end of the buffers past each respective file's EOF).
其次,strcmp
只会与第一个 NULL 值进行比较,如果它是二进制文件,则不一定在文件末尾。因此,您确实应该memcmp
在缓冲区上使用。但同样,这将产生不可预测的结果,因为您的缓冲区是在堆栈上分配的,因此即使您与相同的文件进行比较,超过 EOF 的缓冲区的末尾可能不相同,因此memcmp
仍然会报告错误结果(即,很可能报告文件不相同,因为它们是因为缓冲区末尾的随机值超过每个文件的 EOF)。
To get around this issue, you should really first measure the length of the file by first iterating through the file and seeing how long the file is in bytes, and then using malloc
or calloc
to allocate the buffers you're going to compare, and re-fill those buffers with the actual file's contents. Then you should be able to make a valid comparison of the binary contents of each file. You'll also be able to work with files larger than 64K at that point since you're dynamically allocating the buffers at run-time.
要解决这个问题,您应该首先通过首先遍历文件并查看文件的长度(以字节为单位)来测量文件的长度,然后使用malloc
或calloc
分配要比较的缓冲区,然后重新 -用实际文件的内容填充这些缓冲区。然后您应该能够对每个文件的二进制内容进行有效的比较。那时您还可以处理大于 64K 的文件,因为您在运行时动态分配缓冲区。
回答by mtrw
If you can give up a little speed, here is a C++ way that requires little code:
如果你可以放弃一点速度,这里是一个需要很少代码的 C++ 方式:
#include <fstream>
#include <iterator>
#include <string>
#include <algorithm>
bool compareFiles(const std::string& p1, const std::string& p2) {
std::ifstream f1(p1, std::ifstream::binary|std::ifstream::ate);
std::ifstream f2(p2, std::ifstream::binary|std::ifstream::ate);
if (f1.fail() || f2.fail()) {
return false; //file problem
}
if (f1.tellg() != f2.tellg()) {
return false; //size mismatch
}
//seek back to beginning and use std::equal to compare contents
f1.seekg(0, std::ifstream::beg);
f2.seekg(0, std::ifstream::beg);
return std::equal(std::istreambuf_iterator<char>(f1.rdbuf()),
std::istreambuf_iterator<char>(),
std::istreambuf_iterator<char>(f2.rdbuf()));
}
By using istreambuf_iterators
you push the buffer size choice, actual reading, and tracking of eof into the standard library implementation. std::equal
returns when it hits the first mismatch, so this should not run any longer than it needs to.
通过使用,istreambuf_iterators
您可以将缓冲区大小的选择、实际读取和 eof 的跟踪推送到标准库实现中。std::equal
当它遇到第一个不匹配时返回,所以它不应该运行超过它需要的时间。
This is slower than Linux's cmp
, but it's very easy to read.
这比 Linux 的 慢cmp
,但很容易阅读。
回答by George Kastrinis
When the files are binary, use memcmp not strcmp as \0 might appear as data.
当文件是二进制文件时,请使用 memcmp 而不是 strcmp,因为 \0 可能会显示为数据。
回答by jww
Here's a C++ solution. It seems appropriate since your question is tagged as C++
. The program uses ifstream
's rather than FILE*
's. It also shows you how to seek on a file stream to determine a file's size. Finally, it reads blocks of 4096 at a time, so large files will be processed as expected.
这是一个 C++ 解决方案。这似乎很合适,因为您的问题被标记为C++
。该程序使用ifstream
's 而不是FILE*
's。它还向您展示了如何搜索文件流以确定文件的大小。最后,它一次读取 4096 个块,因此将按预期处理大文件。
// g++ -Wall -Wextra equifile.cpp -o equifile.exe
#include <iostream>
using std::cout;
using std::cerr;
using std::endl;
#include <fstream>
using std::ios;
using std::ifstream;
#include <exception>
using std::exception;
#include <cstring>
#include <cstdlib>
using std::exit;
using std::memcmp;
bool equalFiles(ifstream& in1, ifstream& in2);
int main(int argc, char* argv[])
{
if(argc != 3)
{
cerr << "Usage: equifile.exe <file1> <file2>" << endl;
exit(-1);
}
try {
ifstream in1(argv[1], ios::binary);
ifstream in2(argv[2], ios::binary);
if(equalFiles(in1, in2)) {
cout << "Files are equal" << endl;
exit(0);
}
else
{
cout << "Files are not equal" << endl;
exit(1);
}
} catch (const exception& ex) {
cerr << ex.what() << endl;
exit(-2);
}
return -3;
}
bool equalFiles(ifstream& in1, ifstream& in2)
{
ifstream::pos_type size1, size2;
size1 = in1.seekg(0, ifstream::end).tellg();
in1.seekg(0, ifstream::beg);
size2 = in2.seekg(0, ifstream::end).tellg();
in2.seekg(0, ifstream::beg);
if(size1 != size2)
return false;
static const size_t BLOCKSIZE = 4096;
size_t remaining = size1;
while(remaining)
{
char buffer1[BLOCKSIZE], buffer2[BLOCKSIZE];
size_t size = std::min(BLOCKSIZE, remaining);
in1.read(buffer1, size);
in2.read(buffer2, size);
if(0 != memcmp(buffer1, buffer2, size))
return false;
remaining -= size;
}
return true;
}
回答by RobisonMD
Switch's code looks good to me, but if you want an exact comparison the while condition and the return need to be altered:
Switch 的代码对我来说看起来不错,但是如果你想要一个精确的比较,while 条件和 return 需要改变:
int compareFile(FILE* f1, FILE* f2) {
int N = 10000;
char buf1[N];
char buf2[N];
do {
size_t r1 = fread(buf1, 1, N, f1);
size_t r2 = fread(buf2, 1, N, f2);
if (r1 != r2 ||
memcmp(buf1, buf2, r1)) {
return 0; // Files are not equal
}
} while (!feof(f1) && !feof(f2));
return feof(f1) && feof(f2);
}
回答by Switch
Better to use fread
and memcmp
to avoid \0 character issues. Also, the !feof
checks really should be || instead of && since there's a small chance that one file is bigger than the other and the smaller file is divisible by your buffer size..
更好地使用fread
并memcmp
避免 \0 字符问题。此外,!feof
支票真的应该是 || 而不是 && 因为一个文件比另一个文件大并且较小的文件可以被您的缓冲区大小整除的可能性很小..
int compareFile(FILE* f1, FILE* f2) {
int N = 10000;
char buf1[N];
char buf2[N];
do {
size_t r1 = fread(buf1, 1, N, f1);
size_t r2 = fread(buf2, 1, N, f2);
if (r1 != r2 ||
memcmp(buf1, buf2, r1)) {
return 0;
}
} while (!feof(f1) || !feof(f2));
return 1;
}