php 通过 md5 比较图像是如何工作的?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/4853185/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How does comparing images through md5 work?
提问by TreeTree
Does this method compare the pixel values of the images? I'm guessing it won't work because they are different sizes from each other but what if they are identical, but in different formats? For example, I took a screenshot and saved as a .jpg
and another and saved as a .gif
.
此方法是否比较图像的像素值?我猜这是行不通的,因为它们的大小不同,但如果它们相同,但格式不同怎么办?例如,我截取了一个屏幕截图并另存为一个.jpg
和另一个并另存为.gif
.
回答by jondavidjohn
An MD5 hash is of the actual binary data, so different formats will have completely different binary data.
MD5 哈希是实际的二进制数据,因此不同的格式将具有完全不同的二进制数据。
so for MD5 hashes to match, they must be identicalfiles. (There are exceptions in fringe cases.)
所以要匹配 MD5 哈希,它们必须是相同的文件。(在特殊情况下也有例外。)
This is actually one way forensic law enforcement finds data it deems as contraband. (in reference to images)
这实际上是法医执法部门发现其视为违禁品的数据的一种方式。(参考图片)
回答by Gazler
It is an MD5 Checksum - the same thing you often see when downloading a file, if the MD5 of the downloaded file matches the MD5 given by the provider, then the file transfer was successful. http://en.wikipedia.org/wiki/ChecksumIf there is even 1 bit of difference between the 2 files then the resulting hash will be completely different.
它是一个 MD5 校验和 - 与您在下载文件时经常看到的相同,如果下载文件的 MD5 与提供商提供的 MD5 匹配,则文件传输成功。 http://en.wikipedia.org/wiki/Checksum如果两个文件之间甚至有 1 位差异,那么生成的哈希值将完全不同。
Due to the difference in encoding between a JPG and GIF, the 2 will not have the same MD5 hash.
由于 JPG 和 GIF 之间的编码不同,两者将不会具有相同的 MD5 哈希值。
回答by Marc B
A .jpg file starts with 'JFIF', a .gif starts with 'GIF' when you look at the raw bytes. In otherwords, comparing the on-disk bytes of the "same image" in two different format is pretty much guaranteed to produce two different MD5 hashes, since the file's contents differ - even if the actual image is the "same picture".
当您查看原始字节时,.jpg 文件以“JFIF”开头,.gif 文件以“GIF”开头。换句话说,比较两种不同格式的“相同图像”的磁盘字节几乎可以保证产生两种不同的 MD5 哈希值,因为文件的内容不同 - 即使实际图像是“相同图片”。
To do a hash-based image comparison, you have to compare two images using the same format. It would be very very difficult to produce a .jpg and a .gif of the same image that would compare equal if you converted them to (say) a .bmp. It'd be the same fileformat, but the internal requirements of .gif (8bit, RLE/LZW lossless compression) v.s. the internal requirements of .jpg (24bit, lossy discrete cosine transform compression) mean it's nigh-on impossible to get the same .bmp from both source images.
要进行基于哈希的图像比较,您必须比较使用相同格式的两个图像。如果您将它们转换为(例如).bmp,那么生成相同图像的 .jpg 和 .gif 将非常困难。它是相同的文件格式,但是 .gif(8 位,RLE/LZW 无损压缩)的内部要求与 .jpg(24 位,有损离散余弦变换压缩)的内部要求意味着几乎不可能得到相同的.bmp 来自两个源图像。
回答by Markus
md5
is a hash algorithm, so it does not compare imagesbut it compares data. The data you put in can be nearly anything, like the contents of a file. It then outputs a hashstring based on the contents, which is the raw data of the file.
md5
是一种散列算法,因此它不比较图像但比较数据。您输入的数据几乎可以是任何内容,例如文件的内容。然后它根据内容输出一个哈希字符串,它是文件的原始数据。
So you basically do not compare imageswhen feeding the image into md5
but the raw dataof the image. The hash algorithm does not know anything about it but the raw data, so a jpgand an gif(or any other image format) of the same screenshot will never be the same.
因此,在将图像输入时,您基本上不比较图像,而是比较图像md5
的原始数据。哈希算法除了原始数据外对它一无所知,因此同一屏幕截图的jpg和gif(或任何其他图像格式)永远不会相同。
Even if you compare the decoded imageit will not put out the same hash but will have small differences the human eye cannot see (depending on the amount of compression used). This might be different when comparing the decoded dataof lossless encoded images, but I don't know here.
即使您比较解码后的图像,它也不会输出相同的哈希值,但会有人眼无法看到的微小差异(取决于使用的压缩量)。比较时,这可能是不同的解码数据的无损编码的图像,但我不知道这里。
Take a look at the wikipedia articlefor a more detailed explanation and technical background about hash functions.
查看维基百科文章,了解有关哈希函数的更详细说明和技术背景。
回答by GolezTrol
md5 is a hash. It is a code that is calculated from a bunch of data - any data really.
md5 是一个哈希值。它是从一堆数据中计算出来的代码——实际上是任何数据。
md5 is certainly not unique, but the chance that two different images have the exact same code is quite small. Therefor you could compare images by calculating an md5 code from each of them and compare the codes.
md5 肯定不是唯一的,但是两个不同的图像具有完全相同的代码的可能性很小。因此,您可以通过计算每个图像的 md5 代码来比较图像并比较代码。
回答by Skilldrick
If you're comparing hashes then every single byte of the two images will have to match - they can't use different compression formats, or "look the same". They have to be identical.
如果您要比较哈希,则两个图像的每个字节都必须匹配 - 它们不能使用不同的压缩格式,或“看起来相同”。它们必须相同。
回答by profitphp
You cannot compare using the MD5 sum, as all the other posters have noted. However, you can compare the images in a different way, and it will tell you their similarity regardless of image type, or even size. You can use libPuzzle
正如所有其他海报所指出的那样,您无法使用 MD5 总和进行比较。但是,您可以以不同的方式比较图像,无论图像类型,甚至大小如何,它都会告诉您它们的相似性。你可以使用 libPuzzle
http://libpuzzle.pureftpd.org/project/libpuzzle
http://libpuzzle.pureftpd.org/project/libpuzzle
This is a great library for image comparison and works very well.
这是一个很棒的图像比较库,效果很好。
回答by nav
It will still not work. Any image contains the header portion and the binary image buffer. In the said scenario 1. The the headers will be different between .jpg & .gif resulting in a different md5 sum 2. The image buffer itself may be different due to image compression as used by say the .jpg format.
它仍然不会工作。任何图像都包含标题部分和二进制图像缓冲区。在上述场景中 1. .jpg 和 .gif 之间的标题将不同,导致不同的 md5 总和 2. 由于 .jpg 格式使用的图像压缩,图像缓冲区本身可能不同。