使用 python-tesseract 获取已识别单词的边界框
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/20831612/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Getting the bounding box of the recognized words using python-tesseract
提问by Abtin Rasoulian
I am using python-tesseract to extract words from an image. This is a python wrapper for tesseract which is an OCR code.
我正在使用 python-tesseract 从图像中提取单词。这是 tesseract 的 python 包装器,它是一个 OCR 代码。
I am using the following code for getting the words:
我正在使用以下代码来获取单词:
import tesseract
api = tesseract.TessBaseAPI()
api.Init(".","eng",tesseract.OEM_DEFAULT)
api.SetVariable("tessedit_char_whitelist", "0123456789abcdefghijklmnopqrstuvwxyz")
api.SetPageSegMode(tesseract.PSM_AUTO)
mImgFile = "test.jpg"
mBuffer=open(mImgFile,"rb").read()
result = tesseract.ProcessPagesBuffer(mBuffer,len(mBuffer),api)
print "result(ProcessPagesBuffer)=",result
This returns only the words and not their location/size/orientation (or in other words a bounding box containing them) in the image. I was wondering if there is any way to get that as well
这仅返回图像中的单词而不是它们的位置/大小/方向(或换句话说包含它们的边界框)。我想知道是否有任何方法可以得到它
采纳答案by stwykd
Use pytesseract.image_to_data()
用 pytesseract.image_to_data()
import pytesseract
from pytesseract import Output
import cv2
img = cv2.imread('image.jpg')
d = pytesseract.image_to_data(img, output_type=Output.DICT)
n_boxes = len(d['level'])
for i in range(n_boxes):
(x, y, w, h) = (d['left'][i], d['top'][i], d['width'][i], d['height'][i])
cv2.rectangle(img, (x, y), (x + w, y + h), (0, 255, 0), 2)
cv2.imshow('img', img)
cv2.waitKey(0)
Among the data returned by pytesseract.image_to_data():
返回的数据中pytesseract.image_to_data():
leftis the distance from the upper-left corner of the bounding box, to the left border of the image.topis the distance from the upper-left corner of the bounding box, to the top border of the image.widthandheightare the width and height of the bounding box.confis the model's confidence for the prediction for the word within that bounding box. Ifconfis -1, that means that the corresponding bounding box contains a block of text, rather than just a single word.
left是从边界框的左上角到图像左边界的距离。top是从边界框的左上角到图像上边界的距离。width和height是边界框的宽度和高度。conf是模型对该边界框内单词的预测的置信度。如果conf是 -1,则表示相应的边界框包含一个文本块,而不仅仅是一个单词。
The bounding boxes returned by pytesseract.image_to_boxes()enclose letters so I believe pytesseract.image_to_data()is what you're looking for.
pytesseract.image_to_boxes()附上字母返回的边界框,所以我相信pytesseract.image_to_data()这就是你要找的。
回答by lennon310
tesseract.GetBoxText()method returns the exact position of each character in an array.
tesseract.GetBoxText()方法返回数组中每个字符的确切位置。
Besides, there is a command line option tesseract test.jpg result hocrthat will generate a result.htmlfile with each recognized word's coordinates in it. But I'm not sure whether it can be called through python script.
此外,还有一个命令行选项tesseract test.jpg result hocr将生成一个result.html文件,其中包含每个已识别单词的坐标。但是我不确定它是否可以通过python脚本调用。
回答by khushhall
Using the below code you can get the bounding box corresponding to each character.
使用下面的代码,您可以获得与每个字符对应的边界框。
import csv
import cv2
from pytesseract import pytesseract as pt
pt.run_tesseract('bw.png', 'output', lang=None, boxes=True, config="hocr")
# To read the coordinates
boxes = []
with open('output.box', 'rb') as f:
reader = csv.reader(f, delimiter = ' ')
for row in reader:
if(len(row)==6):
boxes.append(row)
# Draw the bounding box
img = cv2.imread('bw.png')
h, w, _ = img.shape
for b in boxes:
img = cv2.rectangle(img,(int(b[1]),h-int(b[2])),(int(b[3]),h-int(b[4])),(255,0,0),2)
cv2.imshow('output',img)
回答by jtbr
Python tesseractcan do this without writing to file, using the image_to_boxesfunction:
Python tesseract可以使用以下image_to_boxes函数在不写入文件的情况下执行此操作:
import cv2
import pytesseract
filename = 'image.png'
# read the image and get the dimensions
img = cv2.imread(filename)
h, w, _ = img.shape # assumes color image
# run tesseract, returning the bounding boxes
boxes = pytesseract.image_to_boxes(img) # also include any config options you use
# draw the bounding boxes on the image
for b in boxes.splitlines():
b = b.split(' ')
img = cv2.rectangle(img, (int(b[1]), h - int(b[2])), (int(b[3]), h - int(b[4])), (0, 255, 0), 2)
# show annotated image and wait for keypress
cv2.imshow(filename, img)
cv2.waitKey(0)
回答by Endyd
Would comment under lennon310 but don't have enough reputation to comment...
会在 lennon310 下发表评论,但没有足够的声誉发表评论......
To run his command line command tesseract test.jpg result hocrin a python script:
要tesseract test.jpg result hocr在 python 脚本中运行他的命令行命令:
from subprocess import check_call
tesseractParams = ['tesseract', 'test.jpg', 'result', 'hocr']
check_call(tesseractParams)
回答by himanshu_chawla
Some examples are answered aove which can be used with pytesseract, however to use tesserocr python library you can use code given below to find individual word and their bounding boxes:-
一些示例可以与 pytesseract 一起使用,但是要使用 tesserocr python 库,您可以使用下面给出的代码来查找单个单词及其边界框:-
with PyTessBaseAPI(psm=6, oem=1) as api:
level = RIL.WORD
api.SetImageFile(imagePath)
api.Recognize()
ri = api.GetIterator()
while(ri.Next(level)):
word = ri.GetUTF8Text(level)
boxes = ri.BoundingBox(level)
print(word,"word")
print(boxes,"coords")

