使用python和opencv检测图像中的文本区域
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/37771263/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Detect text area in an image using python and opencv
提问by User9412
I want to detect the text area of images using python 2.7 and opencv 2.4.9 and draw a rectangle area around it. Like shown in the example image below.
我想使用 python 2.7 和 opencv 2.4.9 检测图像的文本区域并在它周围绘制一个矩形区域。如下面的示例图像所示。
I am new to image processing so any idea how to do this will be appreciated.
我是图像处理的新手,所以任何想法如何做到这一点将不胜感激。
回答by prijatelj
There are multiple ways to go about detecting text in an image.
有多种方法可以检测图像中的文本。
I recommend looking at this question here, for it may answer your case as well. Although it is not in python, the code can be easily translated from c++ to python (Just look at the API and convert the methods from c++ to python, not hard. I did it myself when I tried their code for my own separate problem). The solutions here may not work for your case, but I recommend trying them out.
我建议在此处查看此问题,因为它也可以回答您的情况。虽然它不是在python中,但代码可以很容易地从c++转换为python(只需查看API并将方法从c++转换为python,并不难。当我为自己的单独问题尝试他们的代码时我自己做了) . 此处的解决方案可能不适用于您的情况,但我建议您尝试一下。
If I were to go about this I would do the following process:
如果我要这样做,我将执行以下过程:
Prep your image: If all of your images you want to edit are roughly like the one you provided, where the actual design consists of a range of gray colors, and the text is always black. I would first white out all content that is not black (or already white). Doing so will leave only the black text left.
准备您的图像:如果您要编辑的所有图像都与您提供的图像大致相同,其中实际设计由一系列灰色组成,并且文本始终为黑色。我会首先将所有非黑色(或已经是白色)的内容涂白。这样做只会留下黑色文本。
# must import if working with opencv in python
import numpy as np
import cv2
# removes pixels in image that are between the range of
# [lower_val,upper_val]
def remove_gray(img,lower_val,upper_val):
hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
lower_bound = np.array([0,0,lower_val])
upper_bound = np.array([255,255,upper_val])
mask = cv2.inRange(gray, lower_bound, upper_bound)
return cv2.bitwise_and(gray, gray, mask = mask)
Now that all you have is the black text the goal is to get those boxes. As stated before, there are different ways of going about this.
现在您所拥有的只是黑色文本,目标是获取这些框。如前所述,有不同的方法来解决这个问题。
Stroke Width Transform (SWT)
描边宽度变换 (SWT)
The typical way to find text areas: you can find text regions by using stroke width transform as depicted in "Detecting Text in Natural Scenes with Stroke Width Transform "by Boris Epshtein, Eyal Ofek, and Yonatan Wexler. To be honest, if this is as fast and reliable as I believe it is, then this method is a more efficient method than my below code. You can still use the code above to remove the blueprint design though, and that mayhelp the overall performance of the swt algorithm.
查找文本区域的典型方法:您可以使用笔触宽度变换来查找文本区域,如Boris Epshtein、Eyal Ofek 和 Yonatan Wexler 所著的“使用笔画宽度变换检测自然场景中的文本”中所述。老实说,如果这和我相信的一样快速和可靠,那么这种方法比我下面的代码更有效。您仍然可以使用上面的代码来删除蓝图设计,这可能有助于 swt 算法的整体性能。
Here is a c librarythat implements their algorithm, but it is stated to be very raw and the documentation is stated to be incomplete. Obviously, a wrapper will be needed in order to use this library with python, and at the moment I do not see an official one offered.
这是实现他们算法的ac 库,但它被声明为非常原始并且文档被声明为不完整。显然,为了将这个库与 python 一起使用,需要一个包装器,目前我没有看到官方提供的包装器。
The library I linked is CCV. It is a library that is meant to be used in your applications, not recreate algorithms. So this is a tool to be used, which goes against OP's want for making it from "First Principles", as stated in comments. Still, useful to know it exists if you don't want to code the algorithm yourself.
我链接的图书馆是CCV。它是一个旨在用于您的应用程序的库,而不是重新创建算法。因此,这是一个要使用的工具,正如评论中所述,它违背了 OP 从“第一原则”中制作它的愿望。尽管如此,如果您不想自己编码算法,知道它存在很有用。
Home Brewed Non-SWT Method
自酿非 SWT 方法
If you have meta data for each image, say in an xml file, that states how many rooms are labeled in each image, then you can access that xml file, get the data about how many labels are in the image, and then store that number in some variable say, num_of_labels
. Now take your image and put it through a while loop that erodes at a set rate that you specify, finding external contours in the image in each loop and stopping the loop once you have the same number of external contours as your num_of_labels
. Then simply find each contours' bounding box and you are done.
如果您有每个图像的元数据,例如在 xml 文件中,说明每个图像中有多少房间被标记,那么您可以访问该 xml 文件,获取有关图像中有多少标签的数据,然后存储该数据某些变量中的数字说,num_of_labels
。现在获取您的图像并将其放入以您指定的设定速率侵蚀的 while 循环中,在每个循环中查找图像中的外部轮廓,并在外部轮廓数量与num_of_labels
. 然后只需找到每个轮廓的边界框即可完成。
# erodes image based on given kernel size (erosion = expands black areas)
def erode( img, kern_size = 3 ):
retval, img = cv2.threshold(img, 254.0, 255.0, cv2.THRESH_BINARY) # threshold to deal with only black and white.
kern = np.ones((kern_size,kern_size),np.uint8) # make a kernel for erosion based on given kernel size.
eroded = cv2.erode(img, kern, 1) # erode your image to blobbify black areas
y,x = eroded.shape # get shape of image to make a white boarder around image of 1px, to avoid problems with find contours.
return cv2.rectangle(eroded, (0,0), (x,y), (255,255,255), 1)
# finds contours of eroded image
def prep( img, kern_size = 3 ):
img = erode( img, kern_size )
retval, img = cv2.threshold(img, 200.0, 255.0, cv2.THRESH_BINARY_INV) # invert colors for findContours
return cv2.findContours(img,cv2.RETR_EXTERNAL,cv2.CHAIN_APPROX_SIMPLE) # Find Contours of Image
# given img & number of desired blobs, returns contours of blobs.
def blobbify(img, num_of_labels, kern_size = 3, dilation_rate = 10):
prep_img, contours, hierarchy = prep( img.copy(), kern_size ) # dilate img and check current contour count.
while len(contours) > num_of_labels:
kern_size += dilation_rate # add dilation_rate to kern_size to increase the blob. Remember kern_size must always be odd.
previous = (prep_img, contours, hierarchy)
processed_img, contours, hierarchy = prep( img.copy(), kern_size ) # dilate img and check current contour count, again.
if len(contours) < num_of_labels:
return (processed_img, contours, hierarchy)
else:
return previous
# finds bounding boxes of all contours
def bounding_box(contours):
bBox = []
for curve in contours:
box = cv2.boundingRect(curve)
bBox.append(box)
return bBox
The resulting boxes from the above method will have space around the labels, and this may include part of the original design, if the boxes are applied to the original image. To avoid this make regions of interest via your new found boxes and trim the white space. Then save that roi's shape as your new box.
上述方法生成的框将在标签周围留出空间,如果框应用于原始图像,则这可能包括原始设计的一部分。为了避免这种情况,通过新发现的框制作感兴趣的区域并修剪空白区域。然后将该 roi 的形状保存为您的新盒子。
Perhaps you have no way of knowing how many labels will be in the image. If this is the case, then I recommend playing around with erosion values until you find the best one to suit your case and get the desired blobs.
也许您无法知道图像中有多少标签。如果是这种情况,那么我建议您使用侵蚀值,直到找到最适合您的情况并获得所需斑点的值。
Or you could try find contours on the remaining content, after removing the design, and combine bounding boxes into one rectangle based on their distance from each other.
或者,您可以尝试在移除设计后在剩余内容上找到轮廓,并根据边界框之间的距离将边界框组合成一个矩形。
After you found your boxes, simply use those boxes with respect to the original image and you will be done.
找到您的盒子后,只需根据原始图像使用这些盒子即可完成。
Scene Text Detection Module in OpenCV 3
OpenCV 3 中的场景文本检测模块
As mentioned in the comments to your question, there already exists a means of scene text detection (not document text detection) in opencv 3. I understand you do not have the ability to switch versions, but for those with the same question and not limited to an older opencv version, I decided to include this at the end. Documentation for the scene text detection can be found with a simple google search.
正如对您的问题的评论中所述,opencv 3 中已经存在一种场景文本检测(而不是文档文本检测)的方法。我知道您没有切换版本的能力,但对于那些有相同问题的人,并且不受限制对于较旧的 opencv 版本,我决定将其包含在最后。可以通过简单的谷歌搜索找到场景文本检测的文档。
The opencv module for text detection also comes with text recognition that implements tessaract, which is a free open-source text recognition module. The downfall of tessaract, and therefore opencv's scene text recognition module is that it is not as refined as commercial applications and is time consuming to use. Thus decreasing its performance, but its free to use, so its the best we got without paying money, if you want text recognition as well.
用于文本检测的 opencv 模块还带有实现 tessaract 的文本识别,这是一个免费的开源文本识别模块。tessaract的没落,也因此opencv的场景文本识别模块,就是不如商业应用那么精细,使用起来费时费力。因此降低了它的性能,但它可以免费使用,所以如果你也想要文本识别,它是我们不花钱就能得到的最好的。
Links:
链接:
- Documentation OpenCv
- Older Documentation
- The source code is located here, for analysis and understanding
Honestly, I lack the experience and expertise in both opencv and image processing in order to provide a detailed way in implementing their text detection module. The same with the SWT algorithm. I just got into this stuff this past few months, but as I learn more I will edit this answer.
老实说,我缺乏 opencv 和图像处理方面的经验和专业知识,无法提供实现其文本检测模块的详细方法。与 SWT 算法相同。在过去的几个月里,我刚刚接触了这些东西,但随着我了解更多,我将编辑这个答案。
回答by nathancy
Here's a simple image processing approach using only thresholding and contour filtering:
这是一种仅使用阈值和轮廓过滤的简单图像处理方法:
Obtain binary image.Load image, convert to grayscale, Gaussian blur, and adaptive threshold
Combine adjacent text.We create a rectangular structuring kernelthen dilateto form a single contour
Filter for text contours.We find contoursand filter using contour area. From here we can draw the bounding box with
cv2.rectangle
过滤文本轮廓。我们找到轮廓并使用轮廓区域进行过滤。从这里我们可以绘制边界框
cv2.rectangle
Using this original input image (removed red lines)
使用这个原始输入图像(去除红线)
After converting the image to grayscale and Gaussian blurring, we adaptive threshold to obtain a binary image
将图像转换为灰度和高斯模糊后,我们自适应阈值以获得二值图像
Next we dilate to combine the text into a single contour
接下来我们扩张以将文本组合成单个轮廓
From here we find contours and filter using a minimum threshold area (in case there was small noise). Here's the result
从这里我们使用最小阈值区域找到轮廓和过滤器(以防有小噪声)。这是结果
If we wanted to, we could also extract and save each ROI using Numpy slicing
如果我们愿意,我们还可以使用 Numpy 切片提取和保存每个 ROI
Code
代码
import cv2
# Load image, grayscale, Gaussian blur, adaptive threshold
image = cv2.imread('1.png')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray, (9,9), 0)
thresh = cv2.adaptiveThreshold(blur,255,cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY_INV,11,30)
# Dilate to combine adjacent text contours
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (9,9))
dilate = cv2.dilate(thresh, kernel, iterations=4)
# Find contours, highlight text areas, and extract ROIs
cnts = cv2.findContours(dilate, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
ROI_number = 0
for c in cnts:
area = cv2.contourArea(c)
if area > 10000:
x,y,w,h = cv2.boundingRect(c)
cv2.rectangle(image, (x, y), (x + w, y + h), (36,255,12), 3)
# ROI = image[y:y+h, x:x+w]
# cv2.imwrite('ROI_{}.png'.format(ROI_number), ROI)
# ROI_number += 1
cv2.imshow('thresh', thresh)
cv2.imshow('dilate', dilate)
cv2.imshow('image', image)
cv2.waitKey()