OpenCV MSER 检测文本区域 - Python

Question

提问by Amit Madan

I have an invoice image, and I want to detect the text on it. So I plan to use 2 steps: first is to identify the text areas, and then using OCR to recognize the text.

我有一张发票图片，我想检测上面的文字。所以我打算使用2个步骤：首先是识别文本区域，然后使用OCR识别文本。

I am using OpenCV 3.0 in python for that. I am able to identify the text(including some non text areas) but I further want to identify text boxes from the image(also excluding the non-text areas).

为此，我在 python 中使用 OpenCV 3.0。我能够识别文本（包括一些非文本区域），但我还想从图像中识别文本框（也不包括非文本区域）。

My input image is: and the output is: and I am using the below code for this:

我的输入图像是：输出是：我为此使用以下代码：

img = cv2.imread('/home/mis/Text_Recognition/bill.jpg')
mser = cv2.MSER_create()
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) #Converting to GrayScale
gray_img = img.copy()

regions = mser.detectRegions(gray, None)
hulls = [cv2.convexHull(p.reshape(-1, 1, 2)) for p in regions]
cv2.polylines(gray_img, hulls, 1, (0, 0, 255), 2)
cv2.imwrite('/home/mis/Text_Recognition/amit.jpg', gray_img) #Saving

Now, I want to identify the text boxes, and remove/unidentify any non-text areas on the invoice. I am new to OpenCV and am a beginner in Python. I am able to find some examples in MATAB exampleand C++ example, but If I convert them to python, it will take a lot of time for me.

现在，我想识别文本框，并删除/取消识别发票上的任何非文本区域。我是 OpenCV 的新手，也是 Python 的初学者。我可以在MATAB 示例和C++示例中找到一些示例，但是如果我将它们转换为 python，我将花费很多时间。

Is there any example with python using OpenCV, or can anyone help me with this?

有没有使用 OpenCV 的 python 示例，或者任何人都可以帮我解决这个问题？

Answer 1

回答by RAFI AFRIDI

Below is the code

下面是代码

# Import packages 
import cv2
import numpy as np

#Create MSER object
mser = cv2.MSER_create()

#Your image path i-e receipt path
img = cv2.imread('/home/rafiullah/PycharmProjects/python-ocr-master/receipts/73.jpg')

#Convert to gray scale
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

vis = img.copy()

#detect regions in gray scale image
regions, _ = mser.detectRegions(gray)

hulls = [cv2.convexHull(p.reshape(-1, 1, 2)) for p in regions]

cv2.polylines(vis, hulls, 1, (0, 255, 0))

cv2.imshow('img', vis)

cv2.waitKey(0)

mask = np.zeros((img.shape[0], img.shape[1], 1), dtype=np.uint8)

for contour in hulls:

    cv2.drawContours(mask, [contour], -1, (255, 255, 255), -1)

#this is used to find only text regions, remaining are ignored
text_only = cv2.bitwise_and(img, img, mask=mask)

cv2.imshow("text only", text_only)

cv2.waitKey(0)

Answer 2

回答by Shreyash Sharma

This is an old post, yet I'd like to contribute that if you are trying to extract all the texts out of an image, here is the code to get that text in an array.

这是一篇旧帖子，但我想贡献一下，如果您试图从图像中提取所有文本，这里是在数组中获取该文本的代码。

import cv2
import numpy as np
import re
import pytesseract
from pytesseract import image_to_string
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
from PIL import Image

image_obj = Image.open("screenshot.png")

rgb = cv2.imread('screenshot.png')
small = cv2.cvtColor(rgb, cv2.COLOR_BGR2GRAY)

#threshold the image
_, bw = cv2.threshold(small, 0.0, 255.0, cv2.THRESH_BINARY_INV | cv2.THRESH_OTSU)

# get horizontal mask of large size since text are horizontal components
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (20, 1))
connected = cv2.morphologyEx(bw, cv2.MORPH_CLOSE, kernel)

# find all the contours
contours, hierarchy,=cv2.findContours(connected.copy(),cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)
#Segment the text lines
counter=0
array_of_texts=[]
for idx in range(len(contours)):
    x, y, w, h = cv2.boundingRect(contours[idx])
    cropped_image = image_obj.crop((x-10, y, x+w+10, y+h ))
    str_store = re.sub(r'([^\s\w]|_)+', '', image_to_string(cropped_image))
    array_of_texts.append(str_store)
    counter+=1

print(array_of_texts)

OpenCV MSER 检测文本区域 - Python

提问by Amit Madan

回答by RAFI AFRIDI

回答by Shreyash Sharma

相关推荐

最近更新

标签

OpenCV MSER 检测文本区域 - Python

提问by Amit Madan

回答by RAFI AFRIDI

回答by Shreyash Sharma

相关推荐

在 Python 中清除缓存或内存

Python 你如何获得 Keras 模型中 tensorflow 输出节点的名称？

python Discord.py 删除文本频道中的所有消息

Python 如何在 PySpark 中运行脚本

相关推荐

最近更新

标签