使用 python-tesseract 获取已识别单词的边界框

Question

提问by Abtin Rasoulian

I am using python-tesseract to extract words from an image. This is a python wrapper for tesseract which is an OCR code.

我正在使用 python-tesseract 从图像中提取单词。这是 tesseract 的 python 包装器，它是一个 OCR 代码。

I am using the following code for getting the words:

我正在使用以下代码来获取单词：

import tesseract

api = tesseract.TessBaseAPI()
api.Init(".","eng",tesseract.OEM_DEFAULT)
api.SetVariable("tessedit_char_whitelist", "0123456789abcdefghijklmnopqrstuvwxyz")
api.SetPageSegMode(tesseract.PSM_AUTO)

mImgFile = "test.jpg"
mBuffer=open(mImgFile,"rb").read()
result = tesseract.ProcessPagesBuffer(mBuffer,len(mBuffer),api)
print "result(ProcessPagesBuffer)=",result

This returns only the words and not their location/size/orientation (or in other words a bounding box containing them) in the image. I was wondering if there is any way to get that as well

这仅返回图像中的单词而不是它们的位置/大小/方向（或换句话说包含它们的边界框）。我想知道是否有任何方法可以得到它

Answer 1

采纳答案by stwykd

Use pytesseract.image_to_data()

用 pytesseract.image_to_data()

import pytesseract
from pytesseract import Output
import cv2
img = cv2.imread('image.jpg')

d = pytesseract.image_to_data(img, output_type=Output.DICT)
n_boxes = len(d['level'])
for i in range(n_boxes):
    (x, y, w, h) = (d['left'][i], d['top'][i], d['width'][i], d['height'][i])
    cv2.rectangle(img, (x, y), (x + w, y + h), (0, 255, 0), 2)

cv2.imshow('img', img)
cv2.waitKey(0)

Among the data returned by pytesseract.image_to_data():

返回的数据中pytesseract.image_to_data()：

leftis the distance from the upper-left corner of the bounding box, to the left border of the image.
topis the distance from the upper-left corner of the bounding box, to the top border of the image.
widthand heightare the width and height of the bounding box.
confis the model's confidence for the prediction for the word within that bounding box. If confis -1, that means that the corresponding bounding box contains a block of text, rather than just a single word.

left是从边界框的左上角到图像左边界的距离。
top是从边界框的左上角到图像上边界的距离。
width和height是边界框的宽度和高度。
conf是模型对该边界框内单词的预测的置信度。如果conf是 -1，则表示相应的边界框包含一个文本块，而不仅仅是一个单词。

The bounding boxes returned by pytesseract.image_to_boxes()enclose letters so I believe pytesseract.image_to_data()is what you're looking for.

pytesseract.image_to_boxes()附上字母返回的边界框，所以我相信pytesseract.image_to_data()这就是你要找的。

Answer 2

回答by lennon310

tesseract.GetBoxText()method returns the exact position of each character in an array.

tesseract.GetBoxText()方法返回数组中每个字符的确切位置。

Besides, there is a command line option tesseract test.jpg result hocrthat will generate a result.htmlfile with each recognized word's coordinates in it. But I'm not sure whether it can be called through python script.

此外，还有一个命令行选项tesseract test.jpg result hocr将生成一个result.html文件，其中包含每个已识别单词的坐标。但是我不确定它是否可以通过python脚本调用。

Answer 3

回答by khushhall

Using the below code you can get the bounding box corresponding to each character.

使用下面的代码，您可以获得与每个字符对应的边界框。

import csv
import cv2
from pytesseract import pytesseract as pt

pt.run_tesseract('bw.png', 'output', lang=None, boxes=True, config="hocr")

# To read the coordinates
boxes = []
with open('output.box', 'rb') as f:
    reader = csv.reader(f, delimiter = ' ')
    for row in reader:
        if(len(row)==6):
            boxes.append(row)

# Draw the bounding box
img = cv2.imread('bw.png')
h, w, _ = img.shape
for b in boxes:
    img = cv2.rectangle(img,(int(b[1]),h-int(b[2])),(int(b[3]),h-int(b[4])),(255,0,0),2)

cv2.imshow('output',img)

Answer 4

回答by jtbr

Python tesseractcan do this without writing to file, using the image_to_boxesfunction:

Python tesseract可以使用以下image_to_boxes函数在不写入文件的情况下执行此操作：

import cv2
import pytesseract

filename = 'image.png'

# read the image and get the dimensions
img = cv2.imread(filename)
h, w, _ = img.shape # assumes color image

# run tesseract, returning the bounding boxes
boxes = pytesseract.image_to_boxes(img) # also include any config options you use

# draw the bounding boxes on the image
for b in boxes.splitlines():
    b = b.split(' ')
    img = cv2.rectangle(img, (int(b[1]), h - int(b[2])), (int(b[3]), h - int(b[4])), (0, 255, 0), 2)

# show annotated image and wait for keypress
cv2.imshow(filename, img)
cv2.waitKey(0)

Answer 5

回答by Endyd

Would comment under lennon310 but don't have enough reputation to comment...

会在 lennon310 下发表评论，但没有足够的声誉发表评论......

To run his command line command tesseract test.jpg result hocrin a python script:

要tesseract test.jpg result hocr在 python 脚本中运行他的命令行命令：

from subprocess import check_call

tesseractParams = ['tesseract', 'test.jpg', 'result', 'hocr']
check_call(tesseractParams)

Answer 6

回答by himanshu_chawla

Some examples are answered aove which can be used with pytesseract, however to use tesserocr python library you can use code given below to find individual word and their bounding boxes:-

一些示例可以与 pytesseract 一起使用，但是要使用 tesserocr python 库，您可以使用下面给出的代码来查找单个单词及其边界框：-

    with PyTessBaseAPI(psm=6, oem=1) as api:
            level = RIL.WORD
            api.SetImageFile(imagePath)
            api.Recognize()
            ri = api.GetIterator()
            while(ri.Next(level)):
                word = ri.GetUTF8Text(level)
                boxes = ri.BoundingBox(level)
                print(word,"word")
                print(boxes,"coords")

使用 python-tesseract 获取已识别单词的边界框

提问by Abtin Rasoulian

采纳答案by stwykd

回答by lennon310

回答by khushhall

回答by jtbr

回答by Endyd

回答by himanshu_chawla

相关推荐

最近更新

标签

使用 python-tesseract 获取已识别单词的边界框

提问by Abtin Rasoulian

采纳答案by stwykd

回答by lennon310

回答by khushhall

回答by jtbr

回答by Endyd

回答by himanshu_chawla

相关推荐

python正则表达式“\1”

将python列表复制到numpy数组时，如何防止TypeError：列表索引必须是整数，而不是元组？

Python 将列总数附加到 Pandas DataFrame

Python ，打印十六进制删除第一个 0？

相关推荐

最近更新

标签