Java 使用 OCR 从图像文件中读取文本的 API

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/22531656/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-13 16:18:03  来源:igfitidea点击:

API to read text from Image file using OCR

javaocr

提问by

I am looking out for an example code or API name from OCR (Optical character recognition) in Java using which I can extract all text present from an image file. Without comparing it with any image which I am doing using below code.

我正在从 Java 中的 OCR(光学字符识别)中寻找示例代码或 API 名称,使用它我可以从图像文件中提取所有文本。没有将它与我使用以下代码所做的任何图像进行比较。

public class OCRTest {

    static String STR = "";

    public static void main(String[] args) {
        OCR l = new OCR(0.70f);
        l.loadFontsDirectory(OCRTest.class, new File("fonts"));
        l.loadFont(OCRTest.class, new File("fonts", "font_1"));
        ImageBinaryGrey i = new ImageBinaryGrey(Capture.load(OCRTest.class, "full.png"));
        STR = l.recognize(i, 1285, 654, 1343, 677, "font_1");
        System.out.println(STR);
    }
}

回答by zenbeni

You can try javaocr on sourceforge: http://javaocr.sourceforge.net/

您可以在 sourceforge 上尝试 javaocr:http://javaocr.sourceforge.net/

There is also a great example with an applet which uses Encog: http://www.heatonresearch.com/articles/42/page1.html

还有一个使用 Encog 的小程序的好例子:http: //www.heatonresearch.com/articles/42/page1.html

That said, OCR requires a lot of power, so it means that if you are looking for a heavy use, you should look after OCR libraries written in C and integrate that with Java.

也就是说,OCR 需要很多功能,所以这意味着如果您正在寻找大量使用,您应该关注用 C 编写的 OCR 库并将其与 Java 集成。

OCR is hard. So be sure to qualify your needs before adventuring yourself in it.

OCR 很难。因此,在冒险之前一定要确定您的需求。

Tesseract and opencv (with javacv for integration for instance) are common choices. There are also commercial solutions such as ABBYY FineReader Engineand ABBYY Cloud OCR SDK.

Tesseract 和 opencv(例如集成 javacv)是常见的选择。还有ABBYY FineReader EngineABBYY Cloud OCR SDK等商业解决方案。

回答by Jinu Jawad

Open Source OCR engine is available from Google for OCR. It can be processed using CMD. You can process the CMD using java for web applications easily.
Please visit https://www.youtube.com/watch?v=Mjg4yyuqr5E. You will get the step by step details to process OCR using CMD.

Google 提供了用于 OCR 的开源 OCR 引擎。可以使用CMD进行处理。您可以轻松地使用 Java 为 Web 应用程序处理 CMD。
请访问https://www.youtube.com/watch?v=Mjg4yyuqr5E。您将获得使用 CMD 处理 OCR 的分步详细信息。

回答by nav3916872

You can try Tess4jor JavaCPP Presets for Tesseract. I perfer later as its easier than the former. Add the dependency to your pom `

您可以尝试Tess4jJavaCPP Presets for Tesseract。我更喜欢后来因为它比前者容易。将依赖项添加到您的 pom `

        <dependency>
            <groupId>org.bytedeco.javacpp-presets</groupId>
            <artifactId>tesseract-platform</artifactId>
            <version>3.04.01-1.3</version>
        </dependency>

` And its simple to code

` 并且它的编码很简单

import org.bytedeco.javacpp.*;
import static org.bytedeco.javacpp.lept.*;
import static org.bytedeco.javacpp.tesseract.*;

public class BasicExample {
    public static void main(String[] args) {
        BytePointer outText;

        TessBaseAPI api = new TessBaseAPI();
        // Initialize tesseract-ocr with English, without specifying tessdata path
        if (api.Init(null, "eng") != 0) {
            System.err.println("Could not initialize tesseract.");
            System.exit(1);
        }

        // Open input image with leptonica library
        PIX image = pixRead(args.length > 0 ? args[0] : "/usr/src/tesseract/testing/phototest.tif");
        api.SetImage(image);
        // Get OCR result
        outText = api.GetUTF8Text();
        System.out.println("OCR output:\n" + outText.getString());

        // Destroy used object and release memory
        api.End();
        outText.deallocate();
        pixDestroy(image);
    }
}

Tess4j is little complex as its requires specific VC++ redistributable package to be installed.

Tess4j 并不复杂,因为它需要安装特定的 VC++ 可再发行包。