C# 带有 Tesseract 界面的 OCR

Question

提问by toh yen cheng

How do you OCR an tiff file using Tesseract's interface in c#?
Currently I only know how to do it using the executable.

如何在 C# 中使用 Tesseract 的界面对 tiff 文件进行 OCR？
目前我只知道如何使用可执行文件来做到这一点。

Answer 1

采纳答案by chakrit

The source code seemed to be geared for an executable, you might need to rewire stuffs a bit so it would build as a DLL instead. I don't have much experience with Visual C++ but I think it shouldn't be too hard with some research. My guess is that someone might have had made a library version already, you should try Google.

源代码似乎适用于可执行文件，您可能需要重新连接一些东西，以便将其构建为 DLL。我对 Visual C++ 没有太多经验，但我认为进行一些研究应该不会太难。我的猜测是有人可能已经制作了一个库版本，你应该试试谷歌。

Once you have tesseract-ocr code in a DLL file, you can then import the file into your C# project via Visual Studio and have it create wrapper classes and do all the marshaling stuffs for you. If you can't import then DllImportwill let you call the functions in the DLL from C# code.

一旦在 DLL 文件中有 tesseract-ocr 代码，您就可以通过 Visual Studio 将该文件导入您的 C# 项目，并让它创建包装类并为您完成所有封送处理。如果您无法导入，则DllImport将允许您从 C# 代码调用 DLL 中的函数。

Then you can take a look at the original executable to find clues on what functions to call to properly OCR a tiff image.

然后，您可以查看原始可执行文件，以找到有关调用哪些函数以正确 OCR tiff 图像的线索。

Answer 2

回答by Mauricio Scheffer

Take a look at tessnet

看看tessnet

Answer 3

回答by Lou Franco

Disclaimer: I work for Atalasoft

免责声明：我为 Atalasoft 工作

Our OCR module supports Tesseractand if that proves to not be good enough, you can upgrade to a better engine and just change one line of code (we provide a common interface to multiple OCR engines).

我们的OCR 模块支持 Tesseract，如果证明不够好，您可以升级到更好的引擎，只需更改一行代码（我们为多个 OCR 引擎提供通用接口）。

Answer 4

回答by linquize

C# program launches tesseract.exe and then reads the output file of tesseract.exe.

C#程序启动tesseract.exe，然后读取tesseract.exe的输出文件。

Process process = Process.Start("tesseract.exe", "out");
process.WaitForExit();
if (process.ExitCode == 0)
{
    string content = File.ReadAllText("out.txt");
}

Answer 5

回答by b_levitt

I discovered today that EMGUnow includes a Tesseract wrapper. While the number of unmanaged dlls of the opencv lib might seem a little daunting, it's nothing that a quick copy to your output directory won't cure. From there the actual OCR process is as simple as three lines:

我今天发现EMGU现在包含一个 Tesseract 包装器。虽然 opencv lib 的非托管 dll 的数量可能看起来有点令人生畏，但快速复制到您的输出目录不会解决任何问题。从那里开始，实际的 OCR 过程就像三行一样简单：

Tesseract ocr = new Tesseract(Path.Combine(Environment.CurrentDirectory, "tessdata"), "eng", Tesseract.OcrEngineMode.OEM_TESSERACT_ONLY);
this.ocr.Recognize(clip);
optOCR.Text = this.ocr.GetText();

"robomatics" put together a very nice youtube videothat demonstrates a simple but effective solution.

“机器人学”汇集了一个非常好的 YouTube 视频，演示了一个简单但有效的解决方案。

C# 带有 Tesseract 界面的 OCR

提问by toh yen cheng

采纳答案by chakrit

回答by Mauricio Scheffer

回答by Lou Franco

回答by linquize

回答by b_levitt

相关推荐

最近更新

标签

C# 带有 Tesseract 界面的 OCR

提问by toh yen cheng

采纳答案by chakrit

回答by Mauricio Scheffer

回答by Lou Franco

回答by linquize

回答by b_levitt

相关推荐

C# 通过 LinePrinter API 将条码打印到 Intermec PB20

C# 存储 .NET 应用程序的用户设置的最佳方式是什么？

C# sizeof() 等价于引用类型？

C# 获取实现接口的所有类型

相关推荐

最近更新

标签