java 是否有可行的手写识别库/程序?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/10249501/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-31 00:13:38  来源:igfitidea点击:

Is there a viable handwriting recognition library / program?

javarmachine-learningocr

提问by screechOwl

I'm looking to process a bunch of scanned response postcards that have handwrittencontact information on them (ie Name, Address, Phone, Email, etc).

我希望处理一堆扫描的回复明信片,上面有手写的联系信息(即姓名、地址、电话、电子邮件等)。

I'm curious if there is a viable open-source library or piece of software to do this (ideally Java or R). In looking around a lot of the information is from 2009 or early and isn't very encouraging.

我很好奇是否有可行的开源库或软件来做到这一点(最好是 Java 或 R)。环顾四周,很多信息来自 2009 年或更早,并不是很令人鼓舞。

The language is English.

语言是英语。

Any suggestions?

有什么建议?

EDIT: I've looked at the OCRopus page but the latest version is from May 2009. Anyone have any experience with this or is there a more recent version?

编辑:我看过 OCRopus 页面,但最新版本是从 2009 年 5 月开始的。有人对此有任何经验还是有更新的版本?

采纳答案by Nikolay

To begin with, as far as i know there are no native opensource Java OCR SDKs. There are Java APIs which wrap calls for native interfaces, tesjeract (http://code.google.com/p/tesjeract/) or Tess4J (http://tess4j.sf.net/).

首先,据我所知,没有原生的开源 Java OCR SDK。有一些 Java API 封装了对原生接口的调用,tesjeract ( http://code.google.com/p/tesjeract/) 或 Tess4J ( http://tess4j.sf.net/)。

Next, you need to specify whether you look for handwritten or handprinted text. If you need handwriting text recognition - i don't beleive you'll be able to solve your tasks because of the reasons stated in other answers.

接下来,您需要指定是查找手写文本还是手写文本。如果您需要手写文本识别 - 我不相信由于其他答案中所述的原因,您将能够解决您的任务。

However, if you need ICR (that stands for intelligent character recognition) for handprinted text (rather clear letters used in surveys, forms, etc.) there could be a solution. While I beleive that tesseract (despite being considered the best among opensource engines) won't do the job for you here, you can look for more accurate SDKs.

但是,如果您需要 ICR(代表智能字符识别)用于手印文本(在调查、表格等中使用的相当清晰的字母),可能有一个解决方案。虽然我相信 tesseract(尽管被认为是开源引擎中最好的)不会在这里为您完成这项工作,但您可以寻找更准确的 SDK。

Maybe this question would help: Handwritten scanned Doc to .txt File?

也许这个问题会有所帮助:手写扫描文档到 .txt 文件?

回答by James Black

You may want to look at http://code.google.com/p/ocropus/, which is an open-source OCR system.

您可能想查看http://code.google.com/p/ocropus/,这是一个开源 OCR 系统。

But, it appears to be written in C++ and python.

但是,它似乎是用 C++ 和 python 编写的。

*UPDATE: *

*更新:*

Since one of the research projects is a handwritten analyzer I expect it may help.

由于其中一个研究项目是手写分析器,我希望它可能会有所帮助。

The OCRopus engine is based on two research projects: a high-performance handwriting recognizer developed in the mid-90's and deployed by the US Census bureau, and novel high-performance layout analysis methods.

OCRopus 引擎基于两个研究项目:90 年代中期开发并由美国人口普查局部署的高性能手写识别器,以及新颖的高性能布局分析方法。

And if you look at http://code.google.com/p/ocropus/source/browse/the source files have been updated since 10/2011 (one of the three was from 3/2012), so it appears to be currently under development still.

如果您查看http://code.google.com/p/ocropus/source/browse/源文件自 2011 年 10 月以来已更新(三个中的一个来自 3/2012),所以它似乎是目前还在开发中。

回答by Tomato

I am not aware about any working open source Handwriting recognition library, regardless I am in the OCR space for a while already. Typically handwriting is more difficult than OCR and I would say that there is no even decent commercial solution. All that exist have their own issues and can only work in very narrow applications like when dictionary is limited, text is well-written, etc. If you still interested I would recommend checking technology from french company I2IA

我不知道有任何可用的开源手写识别库,不管我已经在 OCR 领域有一段时间了。通常手写比 OCR 更难,我会说甚至没有像样的商业解决方案。所有存在的都有自己的问题,只能在非常狭窄的应用程序中工作,例如字典有限,文本写得很好等。 如果您仍然感兴趣,我建议您检查法国公司 I2IA 的技术