Python OCR 的 Tensorflow 模型

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/43610527/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 23:13:10  来源:igfitidea点击:

Tensorflow model for OCR

pythontensorflowdeep-learningmnist

提问by thug_

I am new in Tensorflow and I am trying to build model which will be able to perform OCR on my images. I have to read 9 characters (fixed in all images), numbers and letters. My model would be similar to this

我是 Tensorflow 的新手,我正在尝试构建能够对我的图像执行 OCR 的模型。我必须阅读 9 个字符(固定在所有图像中)、数字和字母。我的模型与此类似

https://matthewearl.github.io/2016/05/06/cnn-anpr/

https://matthewearl.github.io/2016/05/06/cnn-anpr/

My questions would be, should I train my model against each character firstly and after combine characters to get full label represented. Or I should train on full label straight ?

我的问题是,我是否应该首先针对每个字符训练我的模型,然后再结合字符以获得完整的标签表示。或者我应该直接训练全标签?

I know that I need to pass to model, images + labels for corresponding image, what is the format of those labels, is it textual file, I am bit confused about that part, so any explanation about format of labels which are passed to model would be helpful ? I appreciate, thanks.

我知道我需要传递给模型,图像+相应图像的标签,这些标签的格式是什么,是文本文件,我对那部分有点困惑,所以任何关于传递给模型的标签格式的解释会有帮助吗?我很感激,谢谢。

采纳答案by Alexander Gorban

I'd recommend to train an end-to-end OCR model with attention. You can try the Attention OCR which we used to transcribe street names https://github.com/tensorflow/models/tree/master/research/attention_ocr

我建议注意训练端到端的 OCR 模型。您可以尝试我们用来转录街道名称的注意力 OCR https://github.com/tensorflow/models/tree/master/research/attention_ocr

My guess it should work pretty well for your case. Refer to the answer https://stackoverflow.com/a/44461910for instructions on how to prepare the data for it.

我猜它应该很适合你的情况。有关如何为其准备数据的说明,请参阅答案https://stackoverflow.com/a/44461910

回答by Xochipilli

There are a couple of ways to deal with this (the following list is not exhaustive).

有几种方法可以解决这个问题(以下列表并不详尽)。

1) The first one is word classification directly from your image. If your vocabulary of 9 characters is limited you can train a word specific classifier. You can then convolve this classifier with your image and select the word with the highest probability.

1)第一个是直接从您的图像中进行单词分类。如果您的 9 个字符的词汇量有限,您可以训练特定于单词的分类器。然后,您可以将此分类器与您的图像进行卷积,并选择概率最高的单词。

2) The second option is to train a character classifier, find all characters in your image, and find the most likely line that has the 9 character you are looking for.

2)第二个选项是训练字符分类器,找到图像中的所有字符,并找到最可能包含您要查找的 9 个字符的行。

3) The third option is to train a text detector, find all possible text boxes. Then read all text boxes with a sequence-based model, and select the most likely solution that follows your constraints. A simple sequence-based model is introduced in the following paper: http://ai.stanford.edu/~ang/papers/ICPR12-TextRecognitionConvNeuralNets.pdf. Other sequence-based models could be based on HMMs, Connectionist Temporal Classification, Attention based models, etc.

3)第三个选项是训练一个文本检测器,找到所有可能的文本框。然后使用基于序列的模型读取所有文本框,并选择最可能符合您的约束条件的解决方案。以下论文介绍了一个简单的基于序列的模型:http: //ai.stanford.edu/~ang/papers/ICPR12-TextRecognitionConvNeuralNets.pdf。其他基于序列的模型可以基于 HMM、连接主义时间分类、基于注意力的模型等。

4) The fourth option are attention-based models that work end-to-end to first find the text and then output the characters one-by-one.

4)第四个选项是基于注意力的模型,它端到端地工作,首先找到文本,然后一个一个地输出字符。

Note that this list is not exhaustive, there can be many different ways to solve this problem. Other options can even use third party solutions like Abbyy or Tesseract to help solve your problem.

请注意,此列表并非详尽无遗,可以有许多不同的方法来解决此问题。其他选项甚至可以使用 Abbyy 或 Tesseract 等第三方解决方案来帮助解决您的问题。