C# Tesseract 3 (OCR) - .NET 包装器
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/10067002/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Tesseract 3 (OCR) - .NET Wrapper
提问by Jpin
http://code.google.com/p/tesseractdotnet/
http://code.google.com/p/tesseractdotnet/
I am having a problem getting Tesseract to work in my Visual Studio 2010 projects. I have tried console and winforms and both have the same outcome. I have come across a dll by someone else who claims to have it working in VS2010:
我在让 Tesseract 在我的 Visual Studio 2010 项目中工作时遇到问题。我试过控制台和 winforms,两者都有相同的结果。我遇到过其他人声称可以使用它的 dll VS2010:
http://code.google.com/p/tesseractdotnet/issues/detail?id=1
http://code.google.com/p/tesseractdotnet/issues/detail?id=1
I am adding a reference to the dll which can be found in the attached to post 64 from the website above. Every time I build my project I get an AccessViolationExceptionsaying that an attempt was made to read or write protected memory.
我正在添加对 dll 的引用,该引用可以在上面网站的 64 帖子的附件中找到。每次我构建我的项目时,我都会听到AccessViolationException有人试图读取或写入受保护的内存。
public void StartOCR()
{
const string language = "eng";
const string TessractData = @"C:\Users\Joe\Desktop\tessdata\";
using (TesseractProcessor processor = new TesseractProcessor())
{
using (Bitmap bmp = Bitmap.FromFile(fileName) as Bitmap)
{
if (processor.Init(TessractData, language, (int)eOcrEngineMode.OEM_DEFAULT))
{
string text = processor.Recognize(bmp);
}
}
}
}
The access violation exception always points to if (processor.Init(TessractData, language, (int)eOcrEngineMode.OEM_DEFAULT)). I've seen a few suggestions to make sure the solution platform is set to x86in the configuration manager and that the tessdata folder location is finished with trailing slash, to no avail. Any ideas?
访问冲突异常始终指向if (processor.Init(TessractData, language, (int)eOcrEngineMode.OEM_DEFAULT))。我已经看到一些建议,以确保x86在配置管理器中将解决方案平台设置为tessdata 文件夹位置以尾随斜杠完成,但无济于事。有任何想法吗?
采纳答案by Jpin
It appeared to be the contents of the tessdata folder that was causing the problem. Obtained the tessdata folder from the first link and all is now working.
它似乎是导致问题的 tessdata 文件夹的内容。从第一个链接获取 tessdata 文件夹,现在一切正常。
回答by Umar Hassan
I have just completed a project with tesseract engine 3. i think, there is a bug in the engine, that need to be rectified. What i Did to remove "AccessViolationError" is, add "\tessdata" to the real tessdata directory string. I don't know why, but the engine seems to be truncating the innermost directory in the Tessdata path.
我刚刚用tesseract engine 3完成了一个项目。我认为,引擎中存在一个错误,需要纠正。我为删除“AccessViolationError”所做的是,将“\tessdata”添加到真正的 tessdata 目录字符串中。我不知道为什么,但引擎似乎正在截断 Tessdata 路径中最里面的目录。
Just made Full OCR package (Dlls+Tessdata(english)) that works with .net framework 4.
刚刚制作了适用于 .net framework 4 的完整 OCR 包(Dlls+Tessdata(英文))。
回答by G. Goncharov
If somebody has the same problem and advice with trailing slash doesn't work, try... TWO ending slashes! Seriosly. It works for me.
如果有人有同样的问题并且结尾斜杠的建议不起作用,请尝试...两个结尾斜杠!认真的。这个对我有用。
if (processor.Init(@".\tessdata\", "eng", (int)eOcrEngineMode.OEM_DEFAULT))
回答by Nikita
Seems your problem relates to stability issue mentioned here. On the official sitethere is a recommendation to use previous stable release 2.4.1. You can install it from nuget.org via the package manager command: Install-Package Tesseract -Version 2.4.1
似乎您的问题与此处提到的稳定性问题有关。在官方网站上,建议使用以前的稳定版本 2.4.1。您可以通过包管理器命令从 nuget.org 安装它:Install-Package Tesseract -Version 2.4.1

