php 我可以使用 file_get_contents() 来比较两个文件吗?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/3060125/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Can I use file_get_contents() to compare two files?
提问by xdazzyy
I want to synchronize two directories. And I use
我想同步两个目录。我用
file_get_contents($source) === file_get_contents($dest)
to compare two files. Is there any problem to do this?
比较两个文件。这样做有什么问题吗?
回答by Svish
I would rather do something like this:
我宁愿做这样的事情:
function files_are_equal($a, $b)
{
// Check if filesize is different
if(filesize($a) !== filesize($b))
return false;
// Check if content is different
$ah = fopen($a, 'rb');
$bh = fopen($b, 'rb');
$result = true;
while(!feof($ah))
{
if(fread($ah, 8192) != fread($bh, 8192))
{
$result = false;
break;
}
}
fclose($ah);
fclose($bh);
return $result;
}
This checks if the filesize is the same, and if it is it goes through the file step by step.
这将检查文件大小是否相同,如果相同,则它会逐步遍历文件。
- Checking the modified time check can be a quick way in some cases, but it doesn't really tell you anything other than that the files have been modified at different times. They still might have the same content.
- Using sha1 or md5 might be a good idea, but this requires going through the whole file to create that hash. If this hash is something that could be stored and used later, then it's a different story probably, but yeah...
- 在某些情况下,检查修改时间检查可能是一种快速的方法,但除了文件在不同时间被修改之外,它并没有真正告诉您任何其他信息。它们可能仍然具有相同的内容。
- 使用 sha1 或 md5 可能是个好主意,但这需要遍历整个文件来创建该哈希。如果这个散列是可以存储和以后使用的东西,那么它可能是一个不同的故事,但是是的......
回答by Tatu Ulmanen
Use sha1_file()instead. It's faster and works fine if you just need to see whether the files differ. If the files are large, comparing the whole strings to each other can be very heavy. As sha1_file()returns an 40 character representation of the file, comparing files will be very fast.
使用sha1_file()来代替。如果您只需要查看文件是否不同,它会更快并且工作正常。如果文件很大,将整个字符串相互比较可能会非常繁重。由于sha1_file()返回文件的 40 个字符表示,因此比较文件将非常快。
You can also consider other methods like comparing filemtimeor filesize, but this will give you guaranteed results even if there's just one bit that's changed.
您还可以考虑其他方法,例如比较filemtime或文件大小,但这将为您提供有保证的结果,即使只有一点更改。
回答by Piskvor left the building
- Memory: e.g. you have a 32 MB memory limit, and the files are 20 MB each. Unrecoverable fatal error while trying to allocate memory. This can be solved by checking the files by smaller parts.
- Speed: string comparisons are not the fastest thing in the world, calculating a sha1 hash should be faster (if you want to be 110% sure, you can compare the files byte-by-byte when hash matches, but you'll rule out all the cases where content and hash change (99%+ cases))
- Efficiency: do some preliminary checks - e.g. there's no point comparing two files if their size differs.
- 内存:例如,您的内存限制为 32 MB,每个文件为 20 MB。尝试分配内存时出现不可恢复的致命错误。这可以通过按较小的部分检查文件来解决。
- 速度:字符串比较不是世界上最快的事情,计算 sha1 哈希应该更快(如果你想 110% 确定,你可以在哈希匹配时逐字节比较文件,但你会排除内容和哈希值发生变化的所有情况(99% 以上的情况))
- 效率:做一些初步检查 - 例如,如果两个文件的大小不同,就没有必要比较它们。
回答by David Gonrab
Ths will work, but is inherently more inefficient than calculating checksum for both files and comparing these. Good candidates for checksum algorithms are SHA1 and MD5.
这会起作用,但本质上比计算两个文件的校验和并比较它们的效率更低。校验和算法的良好候选者是 SHA1 和 MD5。
if (sha1_file($source) == sha1_file($dest)) {
/* ... */
}
回答by Wilt
Check first for the obvious:
首先检查明显的:
- Compare size
- Compare file type(mime-type).
- Compare content.
- 比较尺寸
- 比较文件类型(mime-type)。
- 比较内容。
(add comparison of date, file name and other metadata to this obvious list if those are also not supposed to be similar).
(如果日期、文件名和其他元数据不应该相似,则将它们的比较添加到这个明显的列表中)。
When comparing content hashing sounds not very efficient like @Oli says in his comment. Ifthe files are different they most likelywill be different already in the beginning. Calculating a hash of two 50 Mb files and then comparing the hash sounds like a waste of time if the second bit is already different...
当比较内容散列时,听起来不是很有效,就像@Oli 在他的评论中所说的那样。如果文件不同,它们很可能在开始时就已经不同了。计算两个 50 Mb 文件的散列,然后比较散列听起来像是在浪费时间,如果第二位已经不同......
Check this post on php.net. Looks very similar to that of @Svishbut it also compares file mime-type. A smart addition if you ask me.
在 上查看此帖子php.net。看起来与@Svish 的非常相似,但它也比较了 file mime-type。如果你问我,这是一个聪明的补充。
回答by Oli
Seems a bit heavy. This will load both files completely as strings and then compare.
好像有点重。这会将两个文件完全加载为字符串,然后进行比较。
I think you might be better off opening both files manually and ticking through them, perhaps just doing a filesize check first.
我认为您最好手动打开两个文件并勾选它们,也许只是先检查文件大小。
回答by Sam152
There isn't anything wrong with what you are doing here, accept it is a little inefficient. Getting the contents of each file and comparing them, especially with larger files or binary data, you may run into problems.
你在这里所做的没有任何问题,接受它有点低效。获取每个文件的内容并进行比较,尤其是对于较大的文件或二进制数据,您可能会遇到问题。
I would take a look at filetime(last modified) and filesize, and run some tests to see if that works for you. It should be all you need at a fraction of the computation power.
我会看看filetime(last modified) 和filesize,并运行一些测试,看看它是否适合你。它应该是您所需的全部计算能力。

