以 python/linux 方式比较两个图像

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1927660/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-03 19:37:18  来源:igfitidea点击:

Compare two images the python/linux way

pythonlinuximage

提问by Peter Bengtsson

Trying to solve a problem of preventing duplicate images to be uploaded.

尝试解决防止重复上传图片的问题。

I have two JPGs. Looking at them I can see that they are in fact identical. But for some reason they have different file size (one is pulled from a backup, the other is another upload) and so they have a different md5 checksum.

我有两个 JPG。看着它们,我可以看到它们实际上是相同的。但是由于某种原因,它们的文件大小不同(一个是从备份中提取的,另一个是另一个上传的),因此它们具有不同的 md5 校验和。

How can I efficiently and confidently compare two images in the same sense as a human would be able to see that they are clearly identical?

我如何才能有效且自信地比较两个图像,就像人类能够看到它们明显相同一样?

Example: http://static.peterbe.com/a.jpgand http://static.peterbe.com/b.jpg

示例:http: //static.peterbe.com/a.jpghttp://static.peterbe.com/b.jpg

Update

更新

I wrote this script:

我写了这个脚本:

import math, operator
from PIL import Image
def compare(file1, file2):
    image1 = Image.open(file1)
    image2 = Image.open(file2)
    h1 = image1.histogram()
    h2 = image2.histogram()
    rms = math.sqrt(reduce(operator.add,
                           map(lambda a,b: (a-b)**2, h1, h2))/len(h1))
    return rms

if __name__=='__main__':
    import sys
    file1, file2 = sys.argv[1:]
    print compare(file1, file2)

Then I downloaded the two visually identical images and ran the script. Output:

然后我下载了两个视觉上相同的图像并运行了脚本。输出:

58.9830484122

Can anybody tell me what a suitable cutoff should be?

谁能告诉我合适的截止时间应该是多少?

Update II

更新二

The difference between a.jpg and b.jpg is that the second one has been saved with PIL:

a.jpg和b.jpg的区别在于第二个已经用PIL保存了:

b=Image.open('a.jpg')
b.save(open('b.jpg','wb'))

This apparently applies some very very light quality modifications. I've now solved my problem by applying the same PIL save to the file being uploaded without doing anything with it and it now works!

这显然适用了一些非常轻质量的修改。我现在已经解决了我的问题,将相同的 PIL 保存应用到正在上传的文件而不对其进行任何操作,现在它可以工作了!

采纳答案by AutomatedTester

There is a OSS project that uses WebDriver to take screen shots and then compares the images to see if there are any issues (http://code.google.com/p/fighting-layout-bugs/)). It does it by openning the file into a stream and then comparing every bit.

有一个 OSS 项目,它使用 WebDriver 进行屏幕截图,然后比较图像以查看是否有任何问题(http://code.google.com/p/fighting-layout-bugs/))。它通过将文件打开到流中然后比较每一位来实现。

You may be able to do something similar with PIL.

您可以使用PIL做类似的事情。

EDIT:

编辑:

After more research I found

经过更多研究,我发现

h1 = Image.open("image1").histogram()
h2 = Image.open("image2").histogram()

rms = math.sqrt(reduce(operator.add,
    map(lambda a,b: (a-b)**2, h1, h2))/len(h1))

on http://snipplr.com/view/757/compare-two-pil-images-in-python/and http://effbot.org/zone/pil-comparing-images.htm

http://snipplr.com/view/757/compare-two-pil-images-in-python/http://effbot.org/zone/pil-comparing-images.htm

回答by fortran

I guess you should decode the images and do a pixel by pixel comparison to see if they're reasonably similar.

我想您应该对图像进行解码并逐个像素地进行比较,以查看它们是否相当相似。

With PIL and Numpy you can do it quite easily:

使用 PIL 和 Numpy,你可以很容易地做到:

import Image
import numpy
import sys

def main():
    img1 = Image.open(sys.argv[1])
    img2 = Image.open(sys.argv[2])

    if img1.size != img2.size or img1.getbands() != img2.getbands():
        return -1

    s = 0
    for band_index, band in enumerate(img1.getbands()):
        m1 = numpy.array([p[band_index] for p in img1.getdata()]).reshape(*img1.size)
        m2 = numpy.array([p[band_index] for p in img2.getdata()]).reshape(*img2.size)
        s += numpy.sum(numpy.abs(m1-m2))
    print s

if __name__ == "__main__":
    sys.exit(main())

This will give you a numeric value that should be very close to 0 if the images are quite the same.

如果图像完全相同,这将为您提供一个应该非常接近 0 的数值。

Note that images that are shifted/rotated will be reported as very different, as the pixels won't match one by one.

请注意,移动/旋转的图像将被报告为非常不同,因为像素不会一一匹配。

回答by Daniel May

You can either compare it using PIL(iterate through pixels / segments of the picture and compare) or if you're looking for a complete identical copy comparison, try comparing the MD5 hash of both files.

您可以使用PIL进行比较(遍历图片的像素/片段并进行比较),或者如果您正在寻找完全相同的副本比较,请尝试比较两个文件的 MD5 哈希。

回答by musicinmybrain

First, I should note they're notidentical; b has been recompressed and lost quality. You can see this if you look carefully on a good monitor.

首先,我应该注意到它们并不相同;b 已被重新压缩并丢失质量。如果您在好的显示器上仔细观察,您可以看到这一点。

To determine that they are subjectively “the same,” you would have to do something like what fortran suggested, although you will have to arbitrarily establish a threshold for “sameness.” To make s independent of image size, and to handle channels a little more sensibly, I would consider doing the RMS (root mean square) Euclidean distance in colorspace between the pixels of the two images. I don't have time to write out the code right now, but basically for each pixel, you compute

要确定它们主观上“相同”,您必须执行类似于 Fortran 建议的操作,尽管您必须任意建立“相同”阈值。为了使 s 独立于图像大小,并更明智地处理通道,我会考虑在两个图像的像素之间的颜色空间中进行 RMS(均方根)欧几里得距离。我现在没有时间写出代码,但基本上对于每个像素,你计算

(R_2 - R_1) ** 2 + (G_2 - G_1) ** 2 + (B_2 - B_1) ** 2

, adding in an

, 加入一个

(A_2 - A_1) ** 2

(A_2 - A_1) ** 2

term if the image has an alpha channel, etc. The result is the square of the colorspace distance between the two images. Find the mean (average) across all pixels, then take the square root of the resulting scalar. Then decide a reasonable threshold for this value.

如果图像具有 Alpha 通道等,则为术语。结果是两幅图像之间色彩空间距离的平方。找到所有像素的平均值(平均值),然后取结果标量的平方根。然后为该值确定一个合理的阈值。

Or, you might just decide that copies of the same original image with different lossy compression are not truly “the same” and stick with the file hash.

或者,您可能只是认为具有不同有损压缩的相同原始图像的副本并不真正“相同”并坚持使用文件哈希。

回答by meduz

the problem of knowing what makes some features of the image more important than other is a whole scientific program. I would suggest some alternatives depending on the solution you want:

知道是什么使图像的某些特征比其他特征更重要的问题是一个完整的科学程序。我会根据您想要的解决方案提出一些替代方案:

  • if your problem is to see if there is a flipping of bits in your JPEGs, then try to image the difference image (there was perhaps a minor edit locally?),

  • to see if images are globally the same, use the Kullback Leibler distance to compare your histograms,

  • to see if you have some qualittative change, before applying other answers, filter your image using the functions below to raise the importance of high-level frequencies:

  • 如果您的问题是查看 JPEG 中是否存在位翻转,则尝试对差异图像进行成像(可能在本地进行了小幅编辑?),

  • 要查看图像是否全局相同,请使用 Kullback Leibler 距离来比较直方图,

  • 要查看您是否有一些定性变化,在应用其他答案之前,请使用以下功能过滤您的图像以提高高级频率的重要性:

code:

代码:

def FTfilter(image,FTfilter):
    from scipy.fftpack import fft2, fftshift, ifft2, ifftshift
    from scipy import real
    FTimage = fftshift(fft2(image)) * FTfilter
    return real(ifft2(ifftshift(FTimage)))
    #return real(ifft2(fft2(image)* FTfilter))


#### whitening
def olshausen_whitening_filt(size, f_0 = .78, alpha = 4., N = 0.01):
    """
    Returns the whitening filter used by (Olshausen, 98)

    f_0 = 200 / 512

    /!\ you will have some problems at dewhitening without a low-pass

    """
    from scipy import mgrid, absolute
    fx, fy = mgrid[-1:1:1j*size[0],-1:1:1j*size[1]]
    rho = numpy.sqrt(fx**2+fy**2)
    K_ols = (N**2 + rho**2)**.5 * low_pass(size, f_0 = f_0, alpha = alpha)
    K_ols /= numpy.max(K_ols)

    return  K_ols

def low_pass(size, f_0, alpha):
    """
    Returns the low_pass filter used by (Olshausen, 98)

    parameters from Atick (p.240)
    f_0 = 22 c/deg in primates: the full image is approx 45 deg
    alpha makes the aspect change (1=diamond on the vert and hor, 2 = anisotropic)

    """

    from scipy import mgrid, absolute
    fx, fy = mgrid[-1:1:1j*size[0],-1:1:1j*size[1]]
    rho = numpy.sqrt(fx**2+fy**2)
    low_pass = numpy.exp(-(rho/f_0)**alpha)

    return  low_pass

(shameless copy from http://www.incm.cnrs-mrs.fr/LaurentPerrinet/Publications/Perrinet08spie)

(无耻复制自http://www.incm.cnrs-mrs.fr/LaurentPerrinet/Publications/Perrinet08spie

回答by Xolve

From here

这里

The quickest way to determine if two images have exactly the same contents is to get the difference between the two images, and then calculate the bounding box of the non-zero regions in this image.

If the images are identical, all pixels in the difference image are zero, and the bounding box function returns None.

判断两幅图像是否具有完全相同内容的最快方法是得到两幅图像之间的差异,然后计算该图像中非零区域的边界框。

如果图像相同,则差异图像中的所有像素均为零,边界框函数返回 None。

from PIL import ImageChops


def equal(im1, im2):
    return ImageChops.difference(im1, im2).getbbox() is None

回答by subiet

Using ImageMagick, you can simply use in your shell [or call via the OS library from within a program]

使用 ImageMagick,您可以简单地在您的 shell 中使用 [或从程序中通过 OS 库调用]

compare image1 image2 output

This will create an output image with the differences marked

这将创建一个带有差异标记的输出图像

compare -metric AE -fuzz 5% image1 image2 output

Will give you a fuzziness factor of 5% to ignore minor pixel differences. More information can be procured from here

将为您提供 5% 的模糊系数以忽略细微的像素差异。可以从这里获得更多信息

回答by Sharpless512

I tested this one and it works the best of all methods and extremly fast!

我测试了这个,它是所有方法中最好的,而且非常快!

def rmsdiff_1997(im1, im2):
    "Calculate the root-mean-square difference between two images"

    h = ImageChops.difference(im1, im2).histogram()

    # calculate rms
    return math.sqrt(reduce(operator.add,
        map(lambda h, i: h*(i**2), h, range(256))
    ) / (float(im1.size[0]) * im1.size[1]))

here linkfor reference

这里链接供参考

回答by Yuriy L

I have tried 3 methods mentioned above and elsewhere. There seems to be two main type of image comparison, Pixel-By-Pixel, and Histogram.

我已经尝试了上面和其他地方提到的 3 种方法。似乎有两种主要的图像比较类型,逐像素和直方图。

I have tried both, and Pixel one does fail 100%, as it actually should, as if we shift second image by 1 pixel, all pixel will not match and we will have 100% no match.

我已经尝试了这两种方法,并且 Pixel 1 确实 100% 失败了,实际上应该如此,就好像我们将第二张图像移动 1 个像素一样,所有像素都将不匹配,我们将 100% 不匹配。

But Histogram comparison should work really good in theory, but it does not.

但是直方图比较在理论上应该非常有效,但事实并非如此。

Here are two images with slightly shifted view port and histogram looks 99% similar, yet algorithm produces result that says "Very Different"

这是两张视口略有偏移的图像,直方图看起来 99% 相似,但算法产生的结果显示“非常不同”

Centered

居中

Same, but Shifted ~15o

相同,但移位~15o

4 different Algorithm result:

4种不同的算法结果:

  • Perfect match: False
  • Pixel difference: 115816402
  • Histogram Comparison: 83.69564286668303
  • HistComparison: 1744.8160719686186
  • 完美匹配:错误
  • 像素差:115816402
  • 直方图对比:83.69564286668303
  • 历史比较:1744.8160719686186

And same comparison of the first image (centred QR) with a 100% different image:

将第一张图像(居中 QR)与 100% 不同的图像进行相同的比较:

Totally different image and histogram

完全不同的图像和直方图

Algorithm results:

算法结果:

  • Perfect match: False
  • Pixel difference: 207893096
  • HistogramComparison: 104.30194643642095
  • HistComparison: 6875.766716148522
  • 完美匹配:错误
  • 像素差:207893096
  • 直方图比较:104.30194643642095
  • 历史比较:6875.766716148522

Any suggestions on how to measure two image difference in a more precise and usable way would be much appreciated. At this stage none of these algorithms seem to produce usable results as slightly different image has very similar/close results to a 100% different image.

任何关于如何以更精确和可用的方式测量两个图像差异的建议将不胜感激。在这个阶段,这些算法似乎都没有产生可用的结果,因为略有不同的图像与 100% 不同的图像具有非常相似/接近的结果。

from PIL import Image
    from PIL import ImageChops
    from functools import reduce
    import numpy
    import sys
    import math
    import operator

# Just checking if images are 100% the same


def equal(im1, im2):
    img1 = Image.open(im1)
    img2 = Image.open(im2)
    return ImageChops.difference(img1, img2).getbbox() is None


def histCompare(im1, im2):
    h1 = Image.open(im1).histogram()
    h2 = Image.open(im2).histogram()

    rms = math.sqrt(reduce(operator.add, map(lambda a, b: (a - b)**2, h1, h2)) / len(h1))
    return rms

# To get a measure of how similar two images are, we calculate the root-mean-square (RMS)
# value of the difference between the images. If the images are exactly identical,
# this value is zero. The following function uses the difference function,
# and then calculates the RMS value from the histogram of the resulting image.


def rmsdiff_1997(im1, im2):
    #"Calculate the root-mean-square difference between two images"
    img1 = Image.open(im1)
    img2 = Image.open(im2)

    h = ImageChops.difference(img1, img2).histogram()

    # calculate rms
    return math.sqrt(reduce(operator.add,
                            map(lambda h, i: h * (i**2), h, range(256))
                            ) / (float(img1.size[0]) * img1.size[1]))

# Pixel by pixel comparison to see if images are reasonably similar.


def countDiff(im1, im2):
    s = 0
    img1 = Image.open(im1)
    img2 = Image.open(im2)

    if img1.size != img2.size or img1.getbands() != img2.getbands():
        return -1

    for band_index, band in enumerate(img1.getbands()):
        m1 = numpy.array([p[band_index] for p in img1.getdata()]).reshape(*img1.size)
        m2 = numpy.array([p[band_index] for p in img2.getdata()]).reshape(*img2.size)
        s += numpy.sum(numpy.abs(m1 - m2))

    return s


print("[Same Image]")
print("Perfect match:", equal("data/start.jpg", "data/start.jpg"))
print("Pixel difference:", countDiff("data/start.jpg", "data/start.jpg"))
print("Histogram Comparison:", rmsdiff_1997("data/start.jpg", "data/start.jpg"))
print("HistComparison:", histCompare("data/start.jpg", "data/start.jpg"))

print("\n[Same Position]")
print("Perfect match:", equal("data/start.jpg", "data/end.jpg"))
print("Pixel difference:", countDiff("data/start.jpg", "data/end.jpg"))
print("Histogram Comparison:", rmsdiff_1997("data/start.jpg", "data/end.jpg"))
print("HistComparison:", histCompare("data/start.jpg", "data/end.jpg"))

print("\n[~5o off]")
print("Perfect match:", equal("data/start.jpg", "data/end2.jpg"))
print("Pixel difference:", countDiff("data/start.jpg", "data/end2.jpg"))
print("Histogram Comparison:", rmsdiff_1997("data/start.jpg", "data/end2.jpg"))
print("HistComparison:", histCompare("data/start.jpg", "data/end2.jpg"))

print("\n[~15o off]")
print("Perfect match:", equal("data/start.jpg", "data/end3.jpg"))
print("Pixel difference:", countDiff("data/start.jpg", "data/end3.jpg"))
print("Histogram Comparison:", rmsdiff_1997("data/start.jpg", "data/end3.jpg"))
print("HistComparison:", histCompare("data/start.jpg", "data/end3.jpg"))

print("\n[100% different]")
print("Perfect match:", equal("data/start.jpg", "data/end4.jpg"))
print("Pixel difference:", countDiff("data/start.jpg", "data/end4.jpg"))
print("Histogram Comparison:", rmsdiff_1997("data/start.jpg", "data/end4.jpg"))
print("HistComparison:", histCompare("data/start.jpg", "data/end4.jpg"))

回答by Surilan

Using only PIL and some of the Python math libraries, it is possible to see if two images are identical to each other in a simple, concise manner. This method has only been tested on image files with the same dimensions and extension, but has avoided several errors made in other answers to this question.

仅使用 PIL 和一些 Python 数学库,就可以以简单、简洁的方式查看两个图像是否相同。此方法仅在具有相同尺寸和扩展名的图像文件上进行了测试,但避免了此问题的其他答案中出现的几个错误。

import math, operator
from PIL import Image
from PIL import ImageChops

def images_are_similar(img1, img2, error=90):
    diff = ImageChops.difference(img1, img2).histogram()
    sq = (value * (i % 256) ** 2 for i, value in enumerate(diff))
    sum_squares = sum(sq)
    rms = math.sqrt(sum_squares / float(img1.size[0] * img1.size[1]))

    # Error is an arbitrary value, based on values when 
    # comparing 2 rotated images & 2 different images.
    return rms < error

Advantages:
Adding % 256to the computation of squares weights each color equally. Many previous answers' RMS formulas give Blue pixel values 3x the weight of Red values, and Green pixel values 2x the weight of Red values.

优点:
添加% 256到平方的计算中,每种颜色的权重均等。许多先前答案的 RMS 公式给出蓝色像素值是红色值权重的 3 倍,绿色像素值是红色值权重的 2 倍。

Easier to grok. While the RMS calculation could be written as a one-liner, with lambdas and the reduce method, expanding it out to 3 lines greatly improves at-a-glance readability.

更容易掌握。虽然可以将 RMS 计算写为单行,使用 lambdas 和 reduce 方法,但将其扩展为 3 行可以大大提高一目了然的可读性。

This code properly detects that rotated images are different from a differently oriented base image. This avoids a pitfall when using histograms to compare images, as pointed out by @musicinmybrain. If histograms of 2 images are created then compared to each other, if one image is a rotation of the other, the comparison will report that there are no differences in the images because the images' histograms are identical. On the other hand, if the images are compared first, then a histogram of the comparison results is created, the images will compare accurately, even if one is a rotation of the other.

此代码正确检测到旋转图像与不同方向的基本图像不同。正如@musicinmybrain 所指出的,这避免了使用直方图比较图像时的陷阱。如果创建了 2 张图像的直方图,然后相互比较,如果一张图像是另一张图像的旋转,则比较将报告图像中没有差异,因为图像的直方图是相同的。另一方面,如果先比较图像,然后创建比较结果的直方图,即使图像是另一个的旋转,图像也会准确比较。

The code used in this answer is a copy/paste from this code.activestate.compost, taking into consideration the 3rd comment, which corrects the heavier weighting of Green and Blue pixel values.

此答案中使用的代码是来自此code.activestate.com帖子的复制/粘贴,考虑到第 3 条评论,它更正了绿色和蓝色像素值的较重权重。