vb.net 只有来自tesseract的数字 - VB上的OCR?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/25629079/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-17 18:09:14  来源:igfitidea点击:

Only digits from tesseract - OCR on VB?

vb.netocrtesseractdigitsemgucv

提问by The Resonator Vibes

I needed an app that observed numbers in my screen and then make calculations with it, so after some days on researching the best and easiest method i found this video (https://www.youtube.com/watch?v=Kjdu8SjEtG0) that leaded me to OCR and EMGU-Tesseract on Visual Basic 2010 express. I understanded the video and I made my own variation of the code on the description of the video.

我需要一个应用程序来观察屏幕上的数字然后用它进行计算,所以在研究了最好和最简单的方法几天后,我发现了这个视频 ( https://www.youtube.com/watch?v=Kjdu8SjEtG0)引导我在 Visual Basic 2010 express 上使用 OCR 和 EMGU-Tesseract。我理解了视频,并根据视频的描述对代码进行了自己的修改。

I imported:

我导入了:

Imports Emgu.CV
Imports Emgu.Util
Imports Emgu.CV.OCR
Imports Emgu.CV.Structure

then i make this based on the original code:

然后我根据原始代码制作:

Dim OCRz As Tesseract = New Tesseract("tessdata", "eng", Tesseract.OcrEngineMode.OEM_TESSERACT_ONLY)
Dim picStc1 As Bitmap = New Bitmap(149, 28)
Dim gfxSTK1 As Graphics = Graphics.FromImage(picStc1)
Dim picNam1 As Bitmap = New Bitmap(149, 28)
Dim gfxNAM1 As Graphics = Graphics.FromImage(picNam1)


Private Sub Timer1_Tick(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Timer1.Tick

    gfxSTK1.CopyFromScreen(New Point(Me.Location.X + Stk1.Location.X + 5, Me.Location.Y + Stk1.Location.Y + 24), New Point(0, 0), picStc1.Size)
    Stk1.Image = picStc1

    gfxNAM1.CopyFromScreen(New Point(Me.Location.X + Nome1.Location.X + 5, Me.Location.Y + Nome1.Location.Y + 24), New Point(0, 0), picNam1.Size)
    Nome1.Image = picNam1

And when i pressed the button i get this :

当我按下按钮时,我得到了这个:

Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click

    OCRz.Recognize(New Image(Of Bgr, Byte)(picStc1))
    BOXSTK1.Text = OCRz.GetText

    OCRz.Recognize(New Image(Of Bgr, Byte)(picNam1))
    BoxNAME1.Text = OCRz.GetText

I now have the text read from the PictureBoxes (picStc1) and (picNam1) thru the OCR engine and its writen on the RichTextBoxes (BoxSTK1) and (NAME1) after i pressed the button.

我现在可以通过 OCR 引擎从 PictureBoxes (picStc1) 和 (picNam1) 读取文本,并在按下按钮后将其写入 RichTextBoxes (BoxSTK1) 和 (NAME1)。

The numbers on the RichTextBox (BoxSTK1) come with commas and other simbols but i just want to grab the numbers. So i found this (https://code.google.com/p/tesseract-ocr/wiki/FAQ#How_do_I_recognize_only_digits?) but i cant implement it on the project, any help on this?

RichTextBox (BoxSTK1) 上的数字带有逗号和其他符号,但我只想获取这些数字。所以我找到了这个(https://code.google.com/p/tesseract-ocr/wiki/FAQ#How_do_I_recognize_only_digits?)但我无法在项目中实现它,对此有什么帮助吗?

(I′m using Emgu 2.9.0.1922, dont know how to see the version of Tesseract)

(我用的是Emgu 2.9.0.1922,不知道怎么看Tesseract的版本)

采纳答案by Jimmy Smith

This digit-based "whitelist" appears to be something you'd set when you initialize the object. Check out this question

这个基于数字的“白名单”似乎是您在初始化对象时设置的内容。 看看这个问题

So you will need to change,

所以你需要改变,

Dim OCRz As Tesseract = New Tesseract("tessdata", "eng", Tesseract.OcrEngineMode.OEM_TESSERACT_ONLY)

To something like this,

对于这样的事情,

Dim OCRz As Tesseract = New Tesseract()
OCRz.SetVariable("tessedit_char_whitelist", "0123456789")
OCRz.init("tessdata", "eng", false)

回答by The Resonator Vibes

Ok people, this problem is solved! Thanks to Mr.Jimmy Smith! Now we dont need to train any tesseract. By converting the OCR value to a string!

好的人,这个问题解决了!感谢吉米史密斯先生!现在我们不需要训练任何tesseract。通过将 OCR 值转换为字符串!

First define the whitelist by using this:

首先使用以下方法定义白名单:

OCRz.SetVariable("tessedit_char_whitelist", ",23456789")

Then convert the string like this and print it:

然后像这样转换字符串并打印它:

RichTextBox1.Text = Convert.ToString(OCRz.GetText).Replace("$", "").Replace(",", "")

At the end we get this:

最后我们得到这个:

Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click

    OCRz.SetVariable("tessedit_char_whitelist", ",23456789")


    OCRz.Init("tessdata", "eng", False)


    OCRz.Recognize(New Image(Of Bgr, Byte)(pic))
    RichTextBox1.Text = Convert.ToString(OCRz.GetText).Replace("$", "").Replace(",", "")

I will thank Jimmy Smith again for his fast answers and really useful, mind yourselves to up vote this guy ;)

我将再次感谢 Jimmy Smith 的快速回答并且非常有用,请注意为这个人投票;)

回答by Nguy?n Th?o

On fix and download:

关于修复和下载

Dim OCRz As Tesseract = 
 New Tesseract("tessdata", "eng",Tesseract.OcrEngineMode.OEM_DEFAULT)
Dim pic As Bitmap = New Bitmap(270, 100)
Dim gfx As Graphics = Graphics.FromImage(pic)