Python OCR 的 Tensorflow 模型

Question

提问by thug_

I am new in Tensorflow and I am trying to build model which will be able to perform OCR on my images. I have to read 9 characters (fixed in all images), numbers and letters. My model would be similar to this

我是 Tensorflow 的新手，我正在尝试构建能够对我的图像执行 OCR 的模型。我必须阅读 9 个字符（固定在所有图像中）、数字和字母。我的模型与此类似

https://matthewearl.github.io/2016/05/06/cnn-anpr/

My questions would be, should I train my model against each character firstly and after combine characters to get full label represented. Or I should train on full label straight ?

我的问题是，我是否应该首先针对每个字符训练我的模型，然后再结合字符以获得完整的标签表示。或者我应该直接训练全标签？

I know that I need to pass to model, images + labels for corresponding image, what is the format of those labels, is it textual file, I am bit confused about that part, so any explanation about format of labels which are passed to model would be helpful ? I appreciate, thanks.

我知道我需要传递给模型，图像+相应图像的标签，这些标签的格式是什么，是文本文件，我对那部分有点困惑，所以任何关于传递给模型的标签格式的解释会有帮助吗？我很感激，谢谢。

Answer 1

采纳答案by Alexander Gorban

I'd recommend to train an end-to-end OCR model with attention. You can try the Attention OCR which we used to transcribe street names https://github.com/tensorflow/models/tree/master/research/attention_ocr

我建议注意训练端到端的 OCR 模型。您可以尝试我们用来转录街道名称的注意力 OCR https://github.com/tensorflow/models/tree/master/research/attention_ocr

My guess it should work pretty well for your case. Refer to the answer https://stackoverflow.com/a/44461910for instructions on how to prepare the data for it.

我猜它应该很适合你的情况。有关如何为其准备数据的说明，请参阅答案https://stackoverflow.com/a/44461910。

Answer 2

回答by Xochipilli

There are a couple of ways to deal with this (the following list is not exhaustive).

有几种方法可以解决这个问题（以下列表并不详尽）。

1) The first one is word classification directly from your image. If your vocabulary of 9 characters is limited you can train a word specific classifier. You can then convolve this classifier with your image and select the word with the highest probability.

1）第一个是直接从您的图像中进行单词分类。如果您的 9 个字符的词汇量有限，您可以训练特定于单词的分类器。然后，您可以将此分类器与您的图像进行卷积，并选择概率最高的单词。

2) The second option is to train a character classifier, find all characters in your image, and find the most likely line that has the 9 character you are looking for.

2）第二个选项是训练字符分类器，找到图像中的所有字符，并找到最可能包含您要查找的 9 个字符的行。

3) The third option is to train a text detector, find all possible text boxes. Then read all text boxes with a sequence-based model, and select the most likely solution that follows your constraints. A simple sequence-based model is introduced in the following paper: http://ai.stanford.edu/~ang/papers/ICPR12-TextRecognitionConvNeuralNets.pdf. Other sequence-based models could be based on HMMs, Connectionist Temporal Classification, Attention based models, etc.

3）第三个选项是训练一个文本检测器，找到所有可能的文本框。然后使用基于序列的模型读取所有文本框，并选择最可能符合您的约束条件的解决方案。以下论文介绍了一个简单的基于序列的模型：http: //ai.stanford.edu/~ang/papers/ICPR12-TextRecognitionConvNeuralNets.pdf。其他基于序列的模型可以基于 HMM、连接主义时间分类、基于注意力的模型等。

4) The fourth option are attention-based models that work end-to-end to first find the text and then output the characters one-by-one.

4）第四个选项是基于注意力的模型，它端到端地工作，首先找到文本，然后一个一个地输出字符。

Note that this list is not exhaustive, there can be many different ways to solve this problem. Other options can even use third party solutions like Abbyy or Tesseract to help solve your problem.

请注意，此列表并非详尽无遗，可以有许多不同的方法来解决此问题。其他选项甚至可以使用 Abbyy 或 Tesseract 等第三方解决方案来帮助解决您的问题。

Python OCR 的 Tensorflow 模型

提问by thug_

采纳答案by Alexander Gorban

回答by Xochipilli

相关推荐

最近更新

标签

Python OCR 的 Tensorflow 模型

提问by thug_

采纳答案by Alexander Gorban

回答by Xochipilli

相关推荐

何时在python中应用（pd.to_numeric）以及何时使用astype（np.float64）？

Python matplotlib 轴上的不同精度

检测Python字符串是数字还是字母

Python 和 Anaconda 之间的混淆

相关推荐

最近更新

标签