vb.net 只有来自tesseract的数字 - VB上的OCR?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/25629079/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Only digits from tesseract - OCR on VB?
提问by The Resonator Vibes
I needed an app that observed numbers in my screen and then make calculations with it, so after some days on researching the best and easiest method i found this video (https://www.youtube.com/watch?v=Kjdu8SjEtG0) that leaded me to OCR and EMGU-Tesseract on Visual Basic 2010 express. I understanded the video and I made my own variation of the code on the description of the video.
我需要一个应用程序来观察屏幕上的数字然后用它进行计算,所以在研究了最好和最简单的方法几天后,我发现了这个视频 ( https://www.youtube.com/watch?v=Kjdu8SjEtG0)引导我在 Visual Basic 2010 express 上使用 OCR 和 EMGU-Tesseract。我理解了视频,并根据视频的描述对代码进行了自己的修改。
I imported:
我导入了:
Imports Emgu.CV
Imports Emgu.Util
Imports Emgu.CV.OCR
Imports Emgu.CV.Structure
then i make this based on the original code:
然后我根据原始代码制作:
Dim OCRz As Tesseract = New Tesseract("tessdata", "eng", Tesseract.OcrEngineMode.OEM_TESSERACT_ONLY)
Dim picStc1 As Bitmap = New Bitmap(149, 28)
Dim gfxSTK1 As Graphics = Graphics.FromImage(picStc1)
Dim picNam1 As Bitmap = New Bitmap(149, 28)
Dim gfxNAM1 As Graphics = Graphics.FromImage(picNam1)
Private Sub Timer1_Tick(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Timer1.Tick
gfxSTK1.CopyFromScreen(New Point(Me.Location.X + Stk1.Location.X + 5, Me.Location.Y + Stk1.Location.Y + 24), New Point(0, 0), picStc1.Size)
Stk1.Image = picStc1
gfxNAM1.CopyFromScreen(New Point(Me.Location.X + Nome1.Location.X + 5, Me.Location.Y + Nome1.Location.Y + 24), New Point(0, 0), picNam1.Size)
Nome1.Image = picNam1
And when i pressed the button i get this :
当我按下按钮时,我得到了这个:
Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
OCRz.Recognize(New Image(Of Bgr, Byte)(picStc1))
BOXSTK1.Text = OCRz.GetText
OCRz.Recognize(New Image(Of Bgr, Byte)(picNam1))
BoxNAME1.Text = OCRz.GetText
I now have the text read from the PictureBoxes (picStc1) and (picNam1) thru the OCR engine and its writen on the RichTextBoxes (BoxSTK1) and (NAME1) after i pressed the button.
我现在可以通过 OCR 引擎从 PictureBoxes (picStc1) 和 (picNam1) 读取文本,并在按下按钮后将其写入 RichTextBoxes (BoxSTK1) 和 (NAME1)。
The numbers on the RichTextBox (BoxSTK1) come with commas and other simbols but i just want to grab the numbers. So i found this (https://code.google.com/p/tesseract-ocr/wiki/FAQ#How_do_I_recognize_only_digits?) but i cant implement it on the project, any help on this?
RichTextBox (BoxSTK1) 上的数字带有逗号和其他符号,但我只想获取这些数字。所以我找到了这个(https://code.google.com/p/tesseract-ocr/wiki/FAQ#How_do_I_recognize_only_digits?)但我无法在项目中实现它,对此有什么帮助吗?
(I′m using Emgu 2.9.0.1922, dont know how to see the version of Tesseract)
(我用的是Emgu 2.9.0.1922,不知道怎么看Tesseract的版本)
采纳答案by Jimmy Smith
This digit-based "whitelist" appears to be something you'd set when you initialize the object. Check out this question
这个基于数字的“白名单”似乎是您在初始化对象时设置的内容。 看看这个问题
So you will need to change,
所以你需要改变,
Dim OCRz As Tesseract = New Tesseract("tessdata", "eng", Tesseract.OcrEngineMode.OEM_TESSERACT_ONLY)
To something like this,
对于这样的事情,
Dim OCRz As Tesseract = New Tesseract()
OCRz.SetVariable("tessedit_char_whitelist", "0123456789")
OCRz.init("tessdata", "eng", false)
回答by The Resonator Vibes
Ok people, this problem is solved! Thanks to Mr.Jimmy Smith! Now we dont need to train any tesseract. By converting the OCR value to a string!
好的人,这个问题解决了!感谢吉米史密斯先生!现在我们不需要训练任何tesseract。通过将 OCR 值转换为字符串!
First define the whitelist by using this:
首先使用以下方法定义白名单:
OCRz.SetVariable("tessedit_char_whitelist", ",23456789")
Then convert the string like this and print it:
然后像这样转换字符串并打印它:
RichTextBox1.Text = Convert.ToString(OCRz.GetText).Replace("$", "").Replace(",", "")
At the end we get this:
最后我们得到这个:
Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
OCRz.SetVariable("tessedit_char_whitelist", ",23456789")
OCRz.Init("tessdata", "eng", False)
OCRz.Recognize(New Image(Of Bgr, Byte)(pic))
RichTextBox1.Text = Convert.ToString(OCRz.GetText).Replace("$", "").Replace(",", "")
I will thank Jimmy Smith again for his fast answers and really useful, mind yourselves to up vote this guy ;)
我将再次感谢 Jimmy Smith 的快速回答并且非常有用,请注意为这个人投票;)
回答by Nguy?n Th?o
Dim OCRz As Tesseract =
New Tesseract("tessdata", "eng",Tesseract.OcrEngineMode.OEM_DEFAULT)
Dim pic As Bitmap = New Bitmap(270, 100)
Dim gfx As Graphics = Graphics.FromImage(pic)

