Java Tess4j 不使用它的 tessdata 文件夹

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/18095708/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-11 22:20:36  来源:igfitidea点击:

Tess4j doesn't use it's tessdata folder

javatesseract

提问by Kiwi Bird

I am using tess4j, the java wrapper of Tesseract. I also have the normal Tesseract installed. I am not exactly sure how tess4j is meant to work, but since it comes with a tessdata folder, I can assume that you would put the language data files there. However, tess4j is only working if the language data files are in the "real" tessdata folder (the one that comes with tesseract, not tess4j). If I remove that folder, I get this error message:

我正在使用 tess4j,Tesseract 的 Java 包装器。我也安装了普通的 Tesseract。我不确定 tess4j 是如何工作的,但由于它带有一个 tessdata 文件夹,我可以假设您会将语言数据文件放在那里。但是,tess4j 仅在语言数据文件位于“真实” tessdata 文件夹(tesseract 附带的文件夹,而不是 tess4j)中时才有效。如果我删除该文件夹,则会收到此错误消息:

Error opening data file C:\Program Files\Tesseract-OCR\tessdata/jpn.trained
data
Please make sure the TESSDATA_PREFIX environment variable is set to the par
ent directory of your "tessdata" directory.
Failed loading language 'jpn'
Tesseract couldn't load any languages!
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x631259dc, pid=5108, tid=
10148
#
# JRE version: 7.0_06-b24
# Java VM: Java HotSpot(TM) Client VM (23.2-b09 mixed mode, sharing windows
-x86 )
# Problematic frame:
# C  [libtesseract302.dll+0x59dc]  STRING::strdup+0x467c
#
# Failed to write core dump. Minidumps are not enabled by default on client
 versions of Windows
#
# An error report file with more information is saved as:
# D:\School\Programs\OCRTest\v1.0.0\hs_err_pid5108.log
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.sun.com/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#

Does this mean I need to have Tesseract installed to use tess4j? Why? Or maby my tess4j tessdata folder is in the wrong place (It is currently with my .java files, the tess4j jars are in a lib folder to which I have set a classpath).

这是否意味着我需要安装 Tesseract 才能使用 tess4j?为什么?或者可能我的 tess4j tessdata 文件夹在错误的位置(它当前包含我的 .java 文件,tess4j jar 位于我设置了类路径的 lib 文件夹中)。

采纳答案by sschrass

Let your TESSDATA_PREFIX environment variablepoint to the tessdata folder of your Tess4j.

让您TESSDATA_PREFIX environment variable指向 Tess4j 的 tessdata 文件夹。

Usually you set up these variable during an installation on the system, but you maybe find a solution here: How do I set environment variables from Java?

通常您在系统上安装期间设置这些变量,但您可能会在这里找到解决方案:如何从 Java 设置环境变量?

You have to do it on the system which runs your app because the tessdata .dlls depend on this enviroment variable.

您必须在运行您的应用程序的系统上执行此操作,因为 tessdata.dll取决于此环境变量。

回答by nguyenq

TESSDATA_PREFIXenvironment variable, if defined, will overrule everything, including that is set by initor setDatapath; but that may change in the near future when an application can specify where its tessdatafolder is.

TESSDATA_PREFIX环境变量,如果定义,将否决一切,包括由initor设置的setDatapath;但在不久的将来,当应用程序可以指定其tessdata文件夹的位置时,这种情况可能会改变。

http://code.google.com/p/tesseract-ocr/issues/detail?id=938
https://groups.google.com/forum/#!topic/tesseract-ocr/bkJwI8WmxSw

http://code.google.com/p/tesseract-ocr/issues/detail?id=938
https://groups.google.com/forum/#!topic/tesseract-ocr/bkJwI8WmxSw

回答by José Mercado

Maybe you haven't the tessdatafolder in your main project folder. This folder has all tesseract supported language (it contains files with .traineddata, .bigrams, .fold, .lm, .nn, .params, .sizeand .word-freqextensions) If you don't have it, follow these steps:

也许您tessdata的主项目文件夹中没有该文件夹。这个文件夹的所有正方体支持的语言(它包含文件.traineddata.bigrams.fold.lm.nn.params.size.word-freq扩展)。如果你没有,请按照下列步骤操作:

  1. Download tessdata-master folder from github.com/tesseract-ocr/tessdata(from download ZIP button)
  2. Unzip the content of tessdata-master.zipfile in your main project folder
  3. Rename tessdata-masterto tessdata
  4. Run your java project and test if it work. At least this works for me.
  1. github.com/tesseract-ocr/tessdata下载 tessdata-master 文件夹(来自下载 ZIP 按钮)
  2. 解压tessdata-master.zip主项目文件夹中的文件内容
  3. 重命名tessdata-mastertessdata
  4. 运行您的 java 项目并测试它是否有效。至少这对我有用。

回答by Chrisdreams13

For those that use maven and don't like to use global variables, this works for me:

对于那些使用 maven 并且不喜欢使用全局变量的人,这对我有用:

File imageFile = new File("C:\random.png");
Tesseract instance = Tesseract.getInstance();

//In case you don't have your own tessdata, let it also be extracted for you
File tessDataFolder = LoadLibs.extractTessResources("tessdata");

//Set the tessdata path
instance.setDatapath(tessDataFolder.getAbsolutePath());

try {
    String result = instance.doOCR(imageFile);
    System.out.println(result);
} catch (TesseractException e) {
    System.err.println(e.getMessage());
}

found here, tested with maven -> net.sourceforge.tess4j:tess4j:3.4.1, also the link use 1.4.1 jar

这里找到,用 maven 测试 -> net.sourceforge.tess4j:tess4j:3.4.1,链接也使用 1.4.1 jar