Python Pytesseract OCR 多个配置选项

Question

提问by Niall Oswald

I am having some problems with pytesseract. I need to configure Tesseract to that it is configured to accept single digits while also only being able to accept numbers as the number zero is often confused with an 'O'.

我在使用 pytesseract 时遇到了一些问题。我需要将 Tesseract 配置为接受单个数字，同时也只能接受数字，因为数字零经常与“O”混淆。

Like this:

像这样：

target = pytesseract.image_to_string(im,config='-psm 7',config='outputbase digits')

Answer 1

回答by thewaywewere

tesseract-4.0.0asupports below psm. If you want to have single character recognition, set psm = 10. And if your text consists of numbers only, you can set tessedit_char_whitelist=0123456789.

tesseract-4.0.0a支持以下psm。如果要识别单个字符，请设置psm = 10. 如果您的文本仅包含数字，则可以设置tessedit_char_whitelist=0123456789.

Page segmentation modes:
  0    Orientation and script detection (OSD) only.
  1    Automatic page segmentation with OSD.
  2    Automatic page segmentation, but no OSD, or OCR.
  3    Fully automatic page segmentation, but no OSD. (Default)
  4    Assume a single column of text of variable sizes.
  5    Assume a single uniform block of vertically aligned text.
  6    Assume a single uniform block of text.
  7    Treat the image as a single text line.
  8    Treat the image as a single word.
  9    Treat the image as a single word in a circle.
 10    Treat the image as a single character.
 11    Sparse text. Find as much text as possible in no particular order.
 12    Sparse text with OSD.
 13    Raw line. Treat the image as a single text line,
                        bypassing hacks that are Tesseract-specific.

Here is a sample usage of image_to_stringwith multiple parameters.

这是image_to_string具有多个参数的示例用法。

target = pytesseract.image_to_string(image, lang='eng', boxes=False, \
        config='--psm 10 --oem 3 -c tessedit_char_whitelist=0123456789')

Hope this helps.

希望这可以帮助。

Answer 2

回答by RALPH BURLESON

The reason you are having trouble is because character restriction does not work in version 4.0. You have to force legacy mode (oem 0) to have it limit found characters. There is a bug somewhere in the tesseract team that they have not yet addressed.

您遇到问题的原因是字符限制在 4.0 版中不起作用。您必须强制使用旧模式 (oem 0) 来限制找到的字符。tesseract 团队的某个地方存在一个他们尚未解决的错误。

Python Pytesseract OCR 多个配置选项

提问by Niall Oswald

回答by thewaywewere

回答by RALPH BURLESON

相关推荐

最近更新

标签

Python Pytesseract OCR 多个配置选项

提问by Niall Oswald

回答by thewaywewere

回答by RALPH BURLESON

相关推荐

Python 使用 openCV 将透明图像叠加到另一个图像上

Python: DeprecationWarning: elementwise == 比较失败; 这将在未来引发错误

Python 请解释“任务已销毁但待处理！”

Python ValueError：输入 0 与层 lstm_13 不兼容：预期 ndim=3，发现 ndim=4

相关推荐

最近更新

标签