C# 带有 Tesseract 界面的 OCR

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/30328/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-03 08:25:01  来源:igfitidea点击:

OCR with the Tesseract interface

提问by toh yen cheng

How do you OCR an tiff file using Tesseract's interface in c#?
Currently I only know how to do it using the executable.

如何在 C# 中使用 Tesseract 的界面对 tiff 文件进行 OCR?
目前我只知道如何使用可执行文件来做到这一点。

采纳答案by chakrit

The source code seemed to be geared for an executable, you might need to rewire stuffs a bit so it would build as a DLL instead. I don't have much experience with Visual C++ but I think it shouldn't be too hard with some research. My guess is that someone might have had made a library version already, you should try Google.

源代码似乎适用于可执行文件,您可能需要重新连接一些东西,以便将其构建为 DLL。我对 Visual C++ 没有太多经验,但我认为进行一些研究应该不会太难。我的猜测是有人可能已经制作了一个库版本,你应该试试谷歌。

Once you have tesseract-ocr code in a DLL file, you can then import the file into your C# project via Visual Studio and have it create wrapper classes and do all the marshaling stuffs for you. If you can't import then DllImportwill let you call the functions in the DLL from C# code.

一旦在 DLL 文件中有 tesseract-ocr 代码,您就可以通过 Visual Studio 将该文件导入您的 C# 项目,并让它创建包装类并为您完成所有封送处理。如果您无法导入,则DllImport将允许您从 C# 代码调用 DLL 中的函数。

Then you can take a look at the original executable to find clues on what functions to call to properly OCR a tiff image.

然后,您可以查看原始可执行文件,以找到有关调用哪些函数以正确 OCR tiff 图像的线索。

回答by Mauricio Scheffer

Take a look at tessnet

看看tessnet

回答by Lou Franco

Disclaimer: I work for Atalasoft

免责声明:我为 Atalasoft 工作

Our OCR module supports Tesseractand if that proves to not be good enough, you can upgrade to a better engine and just change one line of code (we provide a common interface to multiple OCR engines).

我们的OCR 模块支持 Tesseract,如果证明不够好,您可以升级到更好的引擎,只需更改一行代码(我们为多个 OCR 引擎提供通用接口)。

回答by linquize

C# program launches tesseract.exe and then reads the output file of tesseract.exe.

C#程序启动tesseract.exe,然后读取tesseract.exe的输出文件。

Process process = Process.Start("tesseract.exe", "out");
process.WaitForExit();
if (process.ExitCode == 0)
{
    string content = File.ReadAllText("out.txt");
}

回答by b_levitt

I discovered today that EMGUnow includes a Tesseract wrapper. While the number of unmanaged dlls of the opencv lib might seem a little daunting, it's nothing that a quick copy to your output directory won't cure. From there the actual OCR process is as simple as three lines:

我今天发现EMGU现在包含一个 Tesseract 包装器。虽然 opencv lib 的非托管 dll 的数量可能看起来有点令人生畏,但快速复制到您的输出目录不会解决任何问题。从那里开始,实际的 OCR 过程就像三行一样简单:

Tesseract ocr = new Tesseract(Path.Combine(Environment.CurrentDirectory, "tessdata"), "eng", Tesseract.OcrEngineMode.OEM_TESSERACT_ONLY);
this.ocr.Recognize(clip);
optOCR.Text = this.ocr.GetText();

"robomatics" put together a very nice youtube videothat demonstrates a simple but effective solution.

“机器人学”汇集了一个非常好的 YouTube 视频,演示了一个简单但有效的解决方案。