java 在较大的图像中查找已知的子图像

Question

提问by Benjamin Lee

Does anyone know of an algorithm (or search terms / descriptions) to locate a known image within a larger image?

有没有人知道在更大的图像中定位已知图像的算法（或搜索词/描述）？

e.g.

例如

I have an image of a single desktop window containing various buttons and areas (target). I also have code to capture a screen shot of the current desktop. I would like an algorithm that will help me find the target image within the larger desktop image (what exact x and y coordinates the window is located at). The target image may be located anywhere in the larger image and may not be 100% exactly the same (very similar but not exact possibly b/c of OS display differences)

我有一个包含各种按钮和区域（目标）的单个桌面窗口的图像。我还有代码来捕获当前桌面的屏幕截图。我想要一种算法来帮助我在较大的桌面图像中找到目标图像（窗口所在的确切 x 和 y 坐标）。目标图像可能位于较大图像中的任何位置，并且可能不是 100% 完全相同（非常相似但不完全可能是操作系统显示差异的 b/c）

Does anyone know of such an algorithm or class of algorithms?

有谁知道这样的算法或算法类？

I have found various image segmentation and computer vision algorithms but they seem geared to "fuzzy" classification of regions and not locating a specific image within another.

我发现了各种图像分割和计算机视觉算法，但它们似乎适用于区域的“模糊”分类，而不是在另一个图像中定位特定图像。

** My goal is to create a framework that, given some seed target images, can find "look" at the desktop, find the target area and "watch" it for changes.**

**我的目标是创建一个框架，在给定一些种子目标图像的情况下，它可以“查看”桌面，找到目标区域并“观察”它的变化。**

Answer 1

采纳答案by Chris Farmer

You said your image may not be exactly the same, but then say you don't want "fuzzy" algorithms. I'm not sure those are compatible. In general, though, I think you want to look at image registrationalgorithms. There's an open source C++ package called ITKthat might provide some hints. Also ImageJis a popular open source Java package. Both of these have at least some registration capabilities available if you poke around.

您说您的图像可能不完全相同，但又说您不想要“模糊”算法。我不确定这些是否兼容。不过，总的来说，我认为您想查看图像配准算法。有一个名为ITK 的开源 C++ 包可能会提供一些提示。此外ImageJ的是一种流行的开源Java包。如果您四处看看，这两者至少都有一些可用的注册功能。

Answer 2

回答by Benjamin Lee

Have a look at the paper I wrote: http://werner.yellowcouch.org/Papers/subimg/index.html. It's highly detailed and appears to be the only article discussing how to apply fourier transformation to the problem of subimage finding.

看看我写的论文：http: //werner.yellowcouch.org/Papers/subimg/index.html。它非常详细，似乎是唯一一篇讨论如何将傅立叶变换应用于子图像查找问题的文章。

In short, if you want to use the fourier transform one could apply the following formula: the correlation between image A and image B when image A is shifted over dx,dy is given in the following matrix: C=ifft(fft(A) x conjugate(fft(B)). So, the position in image C that has the highest value, has the highest correlation and that position reflects dx,dy.

简而言之，如果您想使用傅立叶变换，可以应用以下公式：当图像 A 在 dx,dy 上移动时，图像 A 和图像 B 之间的相关性在以下矩阵中给出： C=ifft(fft(A) x 共轭(fft(B)). 因此，图像 C 中具有最高值的位置，具有最高的相关性，并且该位置反映了 dx,dy。

This result works well for subimages that are relatively large. For smaller images, some more work is necessary as explained in the article. Nevertheless, such fourier transforms are quite fast. It results in around 3*sxsylog_2(sx*sy)+3*sx*sy operations.

此结果适用于相对较大的子图像。对于较小的图像，如文章中所述，需要做更多的工作。尽管如此，这种傅立叶变换还是相当快的。它导致大约 3*sx sylog_2(sx*sy)+3*sx*sy 操作。

Answer 3

回答by Mr Fooz

Here's the skeleton of code you'd want to use:

这是您要使用的代码框架：

// look for all (x,y) positions where target appears in desktop
List<Loc> findMatches(Image desktop, Image target, float threshold) {
  List<Loc> locs;
  for (int y=0; y<desktop.height()-target.height(); y++) {
      for (int x=0; x<desktop.width()-target.width(); x++) {
          if (imageDistance(desktop, x, y, target) < threshold) {
              locs.append(Loc(x,y));
          }
      }
   }
   return locs;
}

// computes the root mean squared error between a rectangular window in 
// bigImg and target.
float imageDistance(Image bigImg, int bx, int by, Image target) {
    float dist = 0.0;
    for (int y=0; y<target.height(); y++) {
        for (int x=0; x<target.width(); x++) {
            // assume RGB images...
            for (int colorChannel=0; colorChannel<3; colorChannel++) {
                dist += Math.pow(target.getPixel(x,y) - bigImg.getPixel(bx+x,by+y), 2);
            }
         }
    }
    return Math.sqrt(dist) / target.width() / target.height();
}

You could consider other image distances (see a similar question). For your application, the RMS error is probably a good choice.

您可以考虑其他图像距离（请参阅类似问题）。对于您的应用，RMS 误差可能是一个不错的选择。

There are probably various Java libraries that compute this distance for you efficiently.

可能有各种 Java 库可以有效地为您计算这个距离。

Answer 4

回答by xamid

I considered the solution of Werner Van Belle (since all other approaches are incredible slow - not practicable at all):

我考虑了 Werner Van Belle 的解决方案（因为所有其他方法都非常慢 - 根本不可行）：

An Adaptive Filter for the Correct Localization of Subimages: FFT based Subimage Localization Requires Image Normalization to work properly

用于正确定位子图像的自适应滤波器：基于 FFT 的子图像定位需要图像归一化才能正常工作

I wrote the code in C# where I need my solution, but I am getting highly inaccurate results. Does it really not work well, in contrary to Van Belle's statement, or did I do something wrong? I used https://github.com/tszalay/FFTWSharpas a C# wrapper for FFTW.

我在需要解决方案的地方用 C# 编写了代码，但得到的结果非常不准确。与范贝勒的说法相反，它真的不好用，还是我做错了什么？我使用https://github.com/tszalay/FFTWSharp作为 FFTW 的 C# 包装器。

Here is my translated code: (original in C++ at http://werner.yellowcouch.org/Papers/subimg/index.html)

这是我翻译的代码：（原文为 C++，位于http://werner.yellowcouch.org/Papers/subimg/index.html）

using System.Diagnostics;
using System;
using System.Runtime.InteropServices;
using System.Drawing;
using System.Drawing.Imaging;
using System.IO;

using FFTWSharp;

using unsigned1 = System.Byte;
using signed2 = System.Int16;
using signed8 = System.Int64;

public class Subimage
{
    /**
     * This program finds a subimage in a larger image. It expects two arguments.
     * The first is the image in which it must look. The secon dimage is the
     * image that is to be found. The program relies on a number of different
     * steps to perform the calculation.
     *
     * It will first normalize the input images in order to improve the
     * crosscorrelation matching. Once the best crosscorrelation is found
     * a sad-matchers is applied in a grid over the larger image.
     *
     * The following two article explains the details:
     *
     *   Werner Van Belle; An Adaptive Filter for the Correct Localization
     *   of Subimages: FFT based Subimage Localization Requires Image
     *   Normalization to work properly; 11 pages; October 2007.
     *   http://werner.yellowcouch.org/Papers/subimg/
     *
     *   Werner Van Belle; Correlation between the inproduct and the sum
     *   of absolute differences is -0.8485 for uniform sampled signals on
     *   [-1:1]; November 2006
     */
    unsafe public static Point FindSubimage_fftw(string[] args)
    {
        if (args == null || args.Length != 2)
        {
            Console.Write("Usage: subimg\n" + "\n" + "  subimg is an image matcher. It requires two arguments. The first\n" + "  image should be the larger of the two. The program will search\n" + "  for the best position to superimpose the needle image over the\n" + "  haystack image. The output of the the program are the X and Y\n" + "  coordinates.\n\n" + "  http://werner.yellowouch.org/Papers/subimg/\n");
            return new Point();
        }

        /**
         * The larger image will be called A. The smaller image will be called B.
         *
         * The code below relies heavily upon fftw. The indices necessary for the
         * fast r2c and c2r transforms are at best confusing. Especially regarding
         * the number of rows and colums (watch our for Asx vs Asx2 !).
         *
         * After obtaining all the crosscorrelations we will scan through the image
         * to find the best sad match. As such we make a backup of the original data
         * in advance
         *
         */
        int Asx = 0, Asy = 0;
        signed2[] A = read_image(args[0], ref Asx, ref Asy);
        int Asx2 = Asx / 2 + 1;

        int Bsx = 0, Bsy = 0;
        signed2[] B = read_image(args[1], ref Bsx, ref Bsy);

        unsigned1[] Asad = new unsigned1[Asx * Asy];
        unsigned1[] Bsad = new unsigned1[Bsx * Bsy];

        for (int i = 0; i < Bsx * Bsy; i++)
        {
            Bsad[i] = (unsigned1)B[i];
            Asad[i] = (unsigned1)A[i];
        }
        for (int i = Bsx * Bsy; i < Asx * Asy; i++)
            Asad[i] = (unsigned1)A[i];

        /**
         * Normalization and windowing of the images.
         *
         * The window size (wx,wy) is half the size of the smaller subimage. This
         * is useful to have as much good information from the subimg.
         */
        int wx = Bsx / 2;
        int wy = Bsy / 2;
        normalize(ref B, Bsx, Bsy, wx, wy);
        normalize(ref A, Asx, Asy, wx, wy);

        /**
         * Preparation of the fourier transforms.
         * Aa is the amplitude of image A. Af is the frequence of image A
         * Similar for B. crosscors will hold the crosscorrelations.
         */
        IntPtr Aa = fftw.malloc(sizeof(double) * Asx * Asy);
        IntPtr Af = fftw.malloc(sizeof(double) * 2 * Asx2 * Asy);
        IntPtr Ba = fftw.malloc(sizeof(double) * Asx * Asy);
        IntPtr Bf = fftw.malloc(sizeof(double) * 2 * Asx2 * Asy);

        /**
         * The forward transform of A goes from Aa to Af
         * The forward tansform of B goes from Ba to Bf
         * In Bf we will also calculate the inproduct of Af and Bf
         * The backward transform then goes from Bf to Aa again. That
         * variable is aliased as crosscors;
         */
        //#original: fftw_plan_dft_r2c_2d
        //IntPtr forwardA = fftwf.dft(2, new int[] { Asy, Asx }, Aa, Af, fftw_direction.Forward, fftw_flags.Estimate);//equal results
        IntPtr forwardA = fftwf.dft_r2c_2d(Asy, Asx, Aa, Af, fftw_flags.Estimate);
        //#original: fftw_plan_dft_r2c_2d
        //IntPtr forwardB = fftwf.dft(2, new int[] { Asy, Asx }, Ba, Bf, fftw_direction.Forward, fftw_flags.Estimate);//equal results
        IntPtr forwardB = fftwf.dft_r2c_2d(Asy, Asx, Ba, Bf, fftw_flags.Estimate);
        double* crosscorrs = (double*)Aa;
        //#original: fftw_plan_dft_c2r_2d
        //IntPtr backward = fftwf.dft(2, new int[] { Asy, Asx }, Bf, Aa, fftw_direction.Backward, fftw_flags.Estimate);//equal results
        IntPtr backward = fftwf.dft_c2r_2d(Asy, Asx, Bf, Aa, fftw_flags.Estimate);

        /**
         * The two forward transforms of A and B. Before we do so we copy the normalized
         * data into the double array. For B we also pad the data with 0
         */
        for (int row = 0; row < Asy; row++)
            for (int col = 0; col < Asx; col++)
                ((double*)Aa)[col + Asx * row] = A[col + Asx * row];
        fftw.execute(forwardA);

        for (int j = 0; j < Asx * Asy; j++)
            ((double*)Ba)[j] = 0;
        for (int row = 0; row < Bsy; row++)
            for (int col = 0; col < Bsx; col++)
                ((double*)Ba)[col + Asx * row] = B[col + Bsx * row];
        fftw.execute(forwardB);

        /**
         * The inproduct of the two frequency domains and calculation
         * of the crosscorrelations
         */
        double norm = Asx * Asy;
        for (int j = 0; j < Asx2 * Asy; j++)
        {
            double a = ((double*)Af)[j * 2];//#Af[j][0];
            double b = ((double*)Af)[j * 2 + 1];//#Af[j][1];
            double c = ((double*)Bf)[j * 2];//#Bf[j][0];
            double d = ((double*)Bf)[j * 2 + 1];//#-Bf[j][1];
            double e = a * c - b * d;
            double f = a * d + b * c;
            ((double*)Bf)[j * 2] = (double)(e / norm);//#Bf[j][0] = (fftw_real)(e / norm);
            ((double*)Bf)[j * 2 + 1] = (double)(f / norm);//Bf[j][1] = (fftw_real)(f / norm);
        }
        fftw.execute(backward);

        /**
         * We now have a correlation map. We can spent one more pass
         * over the entire image to actually find the best matching images
         * as defined by the SAD.
         * We calculate this by gridding the entire image according to the
         * size of the subimage. In each cel we want to know what the best
         * match is.
         */
        int sa = 1 + Asx / Bsx;
        int sb = 1 + Asy / Bsy;
        int sadx = 0;
        int sady = 0;
        signed8 minsad = Bsx * Bsy * 256L;
        for (int a = 0; a < sa; a++)
        {
            int xl = a * Bsx;
            int xr = xl + Bsx;
            if (xr > Asx) continue;
            for (int b = 0; b < sb; b++)
            {
                int yl = b * Bsy;
                int yr = yl + Bsy;
                if (yr > Asy) continue;

                // find the maximum correlation in this cell
                int cormxat = xl + yl * Asx;
                double cormx = crosscorrs[cormxat];
                for (int x = xl; x < xr; x++)
                    for (int y = yl; y < yr; y++)
                    {
                        int j = x + y * Asx;
                        if (crosscorrs[j] > cormx)
                            cormx = crosscorrs[cormxat = j];
                    }
                int corx = cormxat % Asx;
                int cory = cormxat / Asx;

                // We dont want subimages that fall of the larger image
                if (corx + Bsx > Asx) continue;
                if (cory + Bsy > Asy) continue;

                signed8 sad = 0;
                for (int x = 0; sad < minsad && x < Bsx; x++)
                    for (int y = 0; y < Bsy; y++)
                    {
                        int j = (x + corx) + (y + cory) * Asx;
                        int i = x + y * Bsx;
                        sad += Math.Abs((int)Bsad[i] - (int)Asad[j]);
                    }

                if (sad < minsad)
                {
                    minsad = sad;
                    sadx = corx;
                    sady = cory;
                    // printf("* ");
                }
                // printf("Grid (%d,%d) (%d,%d) Sip=%g Sad=%lld\n",a,b,corx,cory,cormx,sad);
            }
        }
        //Console.Write("{0:D}\t{1:D}\n", sadx, sady);
        /**
         * Aa, Ba, Af and Bf were allocated in this function
         * crosscorrs was an alias for Aa and does not require deletion.
         */
        fftw.free(Aa);
        fftw.free(Ba);
        fftw.free(Af);
        fftw.free(Bf);
        return new Point(sadx, sady);
    }

    private static void normalize(ref signed2[] img, int sx, int sy, int wx, int wy)
    {
        /**
         * Calculate the mean background. We will subtract this
         * from img to make sure that it has a mean of zero
         * over a window size of wx x wy. Afterwards we calculate
         * the square of the difference, which will then be used
         * to normalize the local variance of img.
         */
        signed2[] mean = boxaverage(img, sx, sy, wx, wy);
        signed2[] sqr = new signed2[sx * sy];
        for (int j = 0; j < sx * sy; j++)
        {
            img[j] -= mean[j];
            signed2 v = img[j];
            sqr[j] = (signed2)(v * v);
        }
        signed2[] var = boxaverage(sqr, sx, sy, wx, wy);
        /**
         * The normalization process. Currenlty still
         * calculated as doubles. Could probably be fixed
         * to integers too. The only problem is the sqrt
         */
        for (int j = 0; j < sx * sy; j++)
        {
            double v = Math.Sqrt(Math.Abs((double)var[j]));//#double v = sqrt(fabs(var[j])); <- ambigous
            Debug.Assert(!double.IsInfinity(v) && v >= 0);
            if (v < 0.0001) v = 0.0001;
            img[j] = (signed2)(img[j] * (32 / v));
            if (img[j] > 127) img[j] = 127;
            if (img[j] < -127) img[j] = -127;
        }
        /**
         * As a last step in the normalization we
         * window the sub image around the borders
         * to become 0
         */
        window(ref img, sx, sy, wx, wy);
    }

    private static signed2[] boxaverage(signed2[] input, int sx, int sy, int wx, int wy)
    {
        signed2[] horizontalmean = new signed2[sx * sy];

        Debug.Assert(horizontalmean != null);
        int wx2 = wx / 2;
        int wy2 = wy / 2;
        int from = (sy - 1) * sx;
        int to = (sy - 1) * sx;
        int initcount = wx - wx2;
        if (sx < initcount) initcount = sx;
        int xli = -wx2;
        int xri = wx - wx2;
        for (; from >= 0; from -= sx, to -= sx)
        {
            signed8 sum = 0;
            int count = initcount;
            for (int c = 0; c < count; c++)
                sum += (signed8)input[c + from];
            horizontalmean[to] = (signed2)(sum / count);
            int xl = xli, x = 1, xr = xri;
            /**
             * The area where the window is slightly outside the
             * left boundary. Beware: the right bnoundary could be
             * outside on the other side already
             */
            for (; x < sx; x++, xl++, xr++)
            {
                if (xl >= 0) break;
                if (xr < sx)
                {
                    sum += (signed8)input[xr + from];
                    count++;
                }
                horizontalmean[x + to] = (signed2)(sum / count);
            }
            /**
             * both bounds of the sliding window
             * are fully inside the images
             */
            for (; xr < sx; x++, xl++, xr++)
            {
                sum -= (signed8)input[xl + from];
                sum += (signed8)input[xr + from];
                horizontalmean[x + to] = (signed2)(sum / count);
            }
            /**
             * the right bound is falling of the page
             */
            for (; x < sx; x++, xl++)
            {
                sum -= (signed8)input[xl + from];
                count--;
                horizontalmean[x + to] = (signed2)(sum / count);
            }
        }

        /**
         * The same process as aboe but for the vertical dimension now
         */
        int ssy = (sy - 1) * sx + 1;
        from = sx - 1;
        signed2[] verticalmean = new signed2[sx * sy];
        Debug.Assert(verticalmean != null);
        to = sx - 1;
        initcount = wy - wy2;
        if (sy < initcount) initcount = sy;
        int initstopat = initcount * sx;
        int yli = -wy2 * sx;
        int yri = (wy - wy2) * sx;
        for (; from >= 0; from--, to--)
        {
            signed8 sum = 0;
            int count = initcount;
            for (int d = 0; d < initstopat; d += sx)
                sum += (signed8)horizontalmean[d + from];
            verticalmean[to] = (signed2)(sum / count);
            int yl = yli, y = 1, yr = yri;
            for (; y < ssy; y += sx, yl += sx, yr += sx)
            {
                if (yl >= 0) break;
                if (yr < ssy)
                {
                    sum += (signed8)horizontalmean[yr + from];
                    count++;
                }
                verticalmean[y + to] = (signed2)(sum / count);
            }
            for (; yr < ssy; y += sx, yl += sx, yr += sx)
            {
                sum -= (signed8)horizontalmean[yl + from];
                sum += (signed8)horizontalmean[yr + from];
                verticalmean[y + to] = (signed2)(sum / count);
            }
            for (; y < ssy; y += sx, yl += sx)
            {
                sum -= (signed8)horizontalmean[yl + from];
                count--;
                verticalmean[y + to] = (signed2)(sum / count);
            }
        }
        return verticalmean;
    }

    private static void window(ref signed2[] img, int sx, int sy, int wx, int wy)
    {
        int wx2 = wx / 2;
        int sxsy = sx * sy;
        int sx1 = sx - 1;
        for (int x = 0; x < wx2; x++)
            for (int y = 0; y < sxsy; y += sx)
            {
                img[x + y] = (signed2)(img[x + y] * x / wx2);
                img[sx1 - x + y] = (signed2)(img[sx1 - x + y] * x / wx2);
            }

        int wy2 = wy / 2;
        int syb = (sy - 1) * sx;
        int syt = 0;
        for (int y = 0; y < wy2; y++)
        {
            for (int x = 0; x < sx; x++)
            {
                /**
                 * here we need to recalculate the stuff (*y/wy2)
                 * to preserve the discrete nature of integers.
                 */
                img[x + syt] = (signed2)(img[x + syt] * y / wy2);
                img[x + syb] = (signed2)(img[x + syb] * y / wy2);
            }
            /**
             * The next row for the top rows
             * The previous row for the bottom rows
             */
            syt += sx;
            syb -= sx;
        }
    }

    private static signed2[] read_image(string filename, ref int sx, ref int sy)
    {
        Bitmap image = new Bitmap(filename);
        sx = image.Width;
        sy = image.Height;
        signed2[] GreyImage = new signed2[sx * sy];
        BitmapData bitmapData1 = image.LockBits(new Rectangle(0, 0, image.Width, image.Height), ImageLockMode.ReadOnly, PixelFormat.Format32bppArgb);
        unsafe
        {
            byte* imagePointer = (byte*)bitmapData1.Scan0;

            for (int y = 0; y < bitmapData1.Height; y++)
            {
                for (int x = 0; x < bitmapData1.Width; x++)
                {
                    GreyImage[x + y * sx] = (signed2)((imagePointer[0] + imagePointer[1] + imagePointer[2]) / 3.0);
                    //4 bytes per pixel
                    imagePointer += 4;
                }//end for x
                //4 bytes per pixel
                imagePointer += bitmapData1.Stride - (bitmapData1.Width * 4);
            }//end for y
        }//end unsafe
        image.UnlockBits(bitmapData1);
        return GreyImage;
    }
}

Answer 5

回答by Gabriel Ambrósio Archanjo

You could use unique visual elements of this target area to determine its position. These unique visual elements are like a "signature". Examples: unique icons, images and symbols. This approach works independently of the window resolution if you have unique elements in the corners. For fixed sized windows, just one element is sufficient to find all window coordinates.

您可以使用此目标区域的独特视觉元素来确定其位置。这些独特的视觉元素就像一个“签名”。示例：独特的图标、图像和符号。如果角落中有独特的元素，则此方法独立于窗口分辨率。对于固定大小的窗口，只需一个元素就足以找到所有窗口坐标。

Below I illustrate the idea with a simple example using Marvin Framework.

下面我用一个简单的例子来说明这个想法，使用Marvin 框架。

Unique elements:

独特的元素：

Program output:

程序输出：

Original Image:
window.png

原图：
window.png

Source code:

源代码：

import static marvin.MarvinPluginCollection.*;

public class FindSubimageWindow {
    public FindSubimageWindow(){
        MarvinImage window = MarvinImageIO.loadImage("./res/window.png");
        MarvinImage eclipse = MarvinImageIO.loadImage("./res/eclipse_icon.png");
        MarvinImage progress = MarvinImageIO.loadImage("./res/progress_icon.png");

        MarvinSegment seg1, seg2;
        seg1 = findSubimage(eclipse, window, 0, 0);
        drawRect(window, seg1.x1, seg1.y1, seg1.x2-seg1.x1, seg1.y2-seg1.y1);

        seg2 = findSubimage(progress, window, 0, 0);
        drawRect(window, seg2.x1, seg2.y1, seg2.x2-seg2.x1, seg2.y2-seg2.y1);

        drawRect(window, seg1.x1-10, seg1.y1-10, (seg2.x2-seg1.x1)+25, (seg2.y2-seg1.y1)+20);

        MarvinImageIO.saveImage(window, "./res/window_out.png");
    }
    private void drawRect(MarvinImage image, int x, int y, int width, int height){
        x-=4; y-=4; width+=8; height+=8;
        image.drawRect(x, y, width, height, Color.red);
        image.drawRect(x+1, y+1, width-2, height-2, Color.red);
        image.drawRect(x+2, y+2, width-4, height-4, Color.red);
    }
    public static void main(String[] args) {
        new FindSubimageWindow();
    }   
}

Answer 6

回答by example

Your don't need fuzzy as in "neural network" because (as I understand) you don't have rotation, tilts or similar. If OS display differences are the only modifications the difference should be minimal. So WernerVanBelle's paper is nice but not really necessary and MrFooz's code works - but is terribly innefficient (O(width * height * patter_width * pattern_height)!).

您不需要像“神经网络”那样模糊，因为（据我所知）您没有旋转、倾斜或类似功能。如果操作系统显示差异是唯一的修改，那么差异应该是最小的。所以 WernerVanBelle 的论文很好，但并不是真正必要的，而且 MrFooz 的代码有效 - 但效率非常低（O(width * height * patter_width * pattern_height)！）。

The best algorithm I can think of is the Boyer-Moore algorithm for string searching, modified to allow 2 dimensional searches. http://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string_search_algorithm

我能想到的最好的算法是用于字符串搜索的 Boyer-Moore 算法，经过修改以允许进行二维搜索。 http://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string_search_algorithm

Instead of one displacement you will have to store a pair of displacements dxand dyfor each color. When checking a pixel you move only in the x direction x = x + dxand store only the minimum of the dy's DY = min(DY, dy)to set the new y value after a whole line has been tested (ie x > width).

您必须为每种颜色存储一对置换dx，而不是一个置换dy。检查像素时，您仅在 x 方向上移动x = x + dx并仅存储 dy 的最小值以在DY = min(DY, dy)测试整条线后设置新的 y 值（即x > width）。

Creating a table for all possible colors probably is prohibitve due to the imense number of possible colors, so either use a map to store the rules (and default to the pattern dimensions if a color is not inside the map) or create tables for each color seperately and set dx = max(dx(red), dx(green), dx(blue))- which is only an approximation but removes the overhead of a map.

由于可能的颜色数量庞大，可能无法为所有可能的颜色创建表格，因此要么使用地图来存储规则（如果颜色不在地图内，则默认为图案尺寸）或为每种颜色创建表格单独和设置dx = max(dx(red), dx(green), dx(blue))- 这只是一个近似值，但消除了地图的开销。

In the preprocessing of the bad-character rule, you can account for small deviations of colors by spreading rules from all colors to their "neighbouring" colors (however you wish to define neighbouring).

在坏字符规则的预处理中，您可以通过将规则从所有颜色扩展到它们的“相邻”颜色（但您希望定义相邻）来解决颜色的小偏差。

java 在较大的图像中查找已知的子图像

提问by Benjamin Lee

采纳答案by Chris Farmer

回答by Benjamin Lee

回答by Mr Fooz

回答by xamid

回答by Gabriel Ambrósio Archanjo

回答by example

相关推荐

最近更新

标签

java 在较大的图像中查找已知的子图像

提问by Benjamin Lee

采纳答案by Chris Farmer

回答by Benjamin Lee

回答by Mr Fooz

回答by xamid

回答by Gabriel Ambrósio Archanjo

回答by example

相关推荐

java Hibernate 事务管理器问题

如何使用 JVM 参数为 java.util.Calendar 指定 firstDayOfWeek

在 Java 中舍入负数

java 为 JTree 实现工具提示的最佳方法？

相关推荐

最近更新

标签