iOS:实时相机馈送之上的实时 OCR(类似于 iTunes 兑换礼品卡)

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/19101391/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-31 01:53:10  来源:igfitidea点击:

iOS: Real Time OCR on top of live camera feed (similar to iTunes Redeem Gift Card)

iosocr

提问by boliva

Is there a way to accomplish something similar to what the iTunes and App Store Apps do when you redeem a Gift Card using the device camera, recognizing a short string of characters in real time on top of the live camera feed?

当您使用设备摄像头兑换礼品卡时,有没有办法完成类似于 iTunes 和 App Store 应用程序所做的事情,在实时摄像头馈送之上实时识别一小串字符?

iTunes App Redeem Gift Card UI

iTunes App 兑换礼品卡 UI

I know that in iOS 7 there is now the AVMetadataMachineReadableCodeObjectclass which, AFAIK, only represents barcodes. I'm more interested in detecting and reading the contents of a short string. Is this possible using publicly available API methods, or some other third party SDK that you might know of?

我知道在 iOS 7 中,现在有一个AVMetadataMachineReadableCodeObject类,AFAIK,只代表条形码。我对检测和读取短字符串的内容更感兴趣。这是否可以使用公开可用的 API 方法或您可能知道的其他第三方 SDK?

There is also a video of the process in action:

还有一个操作过程的视频:

https://www.youtube.com/watch?v=c7swRRLlYEo

https://www.youtube.com/watch?v=c7swRRLlYEo

Best,

最好的事物,

回答by Donovan

I'm working on a project that does something similar to the Apple app store redeem with camera as you mentioned.

我正在做一个类似于你提到的苹果应用商店用相机兑换的项目。

A great starting place on processing live video is a project I found on GitHub. This is using the AVFoundation framework and you implement the AVCaptureVideoDataOutputSampleBufferDelegate methods.

处理实时视频的一个很好的起点是我在 GitHub 上找到的一个项目。这是使用 AVFoundation 框架,您实现了 AVCaptureVideoDataOutputSampleBufferDelegate 方法。

Once you have the image stream (video), you can use OpenCV to process the video. You need to determine the area in the image you want to OCR before you run it through Tesseract. You have to play with the filtering, but the broad steps you take with OpenCV are:

一旦有了图像流(视频),就可以使用 OpenCV 来处理视频。在通过 Tesseract 运行之前,您需要确定图像中要进行 OCR 的区域。您必须使用过滤,但您使用 OpenCV 采取的主要步骤是:

  • Convert the images to B&W using cv::cvtColor(inputMat, outputMat, CV_RGBA2GRAY);
  • Threshold the images to eliminate unnecessary elements. You specify the threshold value to eliminate, and then set everything else to black (or white).
  • Determine the lines that form the boundary of the box (or whatever you are processing). You can either create a "bounding box" if you have eliminated everything but the desired area, or use the HoughLines algorithm (or the probabilistic version, HoughLinesP). Using this, you can determine line intersection to find corners, and use the corners to warp the desired area to straighten it into a proper rectangle (if this step is necessary in your application) prior to OCR.
  • Process the portion of the image with Tesseract OCR library to get the resulting text. It is possible to create training files for letters in OpenCV so you can read the text without Tesseract. This could be faster but also could be a lot more work. In the App Store case, they are doing something similar to display the text that was read overlaid on top of the original image. This adds to the cool factor, so it just depends on what you need.
  • 使用 cv::cvtColor(inputMat, outputMat, CV_RGBA2GRAY); 将图像转换为黑白;
  • 阈值图像以消除不必要的元素。您指定要消除的阈值,然后将其他所有内容设置为黑色(或白色)。
  • 确定形成框边界的线(或您正在处理的任何内容)。如果您已经消除了除所需区域之外的所有内容,您可以创建一个“边界框”,或者使用 HoughLines 算法(或概率版本,HoughLinesP)。使用它,您可以确定线交点以找到角,并在 OCR 之前使用角扭曲所需区域以将其拉直为适当的矩形(如果您的应用程序需要此步骤)。
  • 使用 Tesseract OCR 库处理图像的一部分以获得结果文本。可以在 OpenCV 中为字母创建训练文件,这样你就可以在没有 Tesseract 的情况下阅读文本。这可能会更快,但也可能需要做更多的工作。在 App Store 的案例中,他们正在做一些类似的事情来显示覆盖在原始图像上的文本。这增加了很酷的因素,所以它只取决于你需要什么。

Some other hints:

其他一些提示:

  • I used the book "Instant OpenCV" to get started quickly with this. It was pretty helpful.
  • Download OpenCV for iOS from OpenCV.org/downloads.html
  • I have found adaptive thresholding to be very useful, you can read all about it by searching for "OpenCV adaptiveThreshold". Also, if you have an image with very little in between light and dark elements, you can use Otsu's Binarization. This automatically determines the threshold values based on the histogram of the grayscale image.
  • 我使用了“Instant OpenCV”这本书来快速入门。这很有帮助。
  • 从 OpenCV.org/downloads.html 下载适用于 iOS 的 OpenCV
  • 我发现自适应阈值处理非常有用,您可以通过搜索“OpenCV 自适应阈值”来阅读所有相关信息。此外,如果您的图像在明暗元素之间很少,您可以使用Otsu 的二值化。这会根据灰度图像的直方图自动确定阈值。

回答by Francis Li

This Q&A thread seems to consistently be one of the top search hits for the topic of OCR on iOS, but is fairly out of date, so I thought I'd post some additional resources that might be useful that I've found as of the time of writing this post:

这个问答主题似乎一直是 iOS 上 OCR 主题的热门搜索之一,但已经过时了,所以我想我会发布一些可能有用的额外资源写这篇文章的时间:

Vision Framework
https://developer.apple.com/documentation/vision
As of iOS 11, you can now use the included CoreML-based Vision framework for things like rectangle or text detection. I've found that I no longer need to use OpenCV with these capabilities included in the OS. However, note that text detectionis not the same as text recognitionor OCR so you will still need another library like Tesseract (or possibly your own CoreML model) to translate the detected parts of the image into actual text.

Vision Framework
https://developer.apple.com/documentation/vision
从 iOS 11 开始,您现在可以使用包含的基于 CoreML 的 Vision 框架进行矩形或文本检测等操作。我发现我不再需要使用包含在操作系统中的这些功能的 OpenCV。但是,请注意,文本检测与文本识别或 OCR 不同,因此您仍然需要另一个库,如 Tesseract(或可能是您自己的 CoreML 模型)将检测到的图像部分转换为实际文本。

SwiftOCR
https://github.com/garnele007/SwiftOCR
If you're just interested in recognizing alphanumeric codes, this OCR library claims significant speed, memory consumption, and accuracy improvements over Tesseract (I have not tried it myself).

SwiftOCR
https://github.com/garnele007/SwiftOCR
如果你只是对识别字母数字代码感兴趣,这个 OCR 库声称比 Tesseract 有显着的速度、内存消耗和准确性改进(我自己没有尝试过)。

ML Kit
https://firebase.google.com/products/ml-kit/
Google has released ML Kit as part of its Firebase suite of developer tools, in beta at the time of writing this post. Similar to Apple's CoreML, it is a machine learning framework that can use your own trained models, but also has pre-trained models for common image processing tasks like Vision Framework. UnlikeVision Framework, this alsoincludes a model for on-device text recognitionof Latin characters. Currently, use of this library is freefor on-device functionality, with charges for using cloud/SAAS API offerings from Google. I have opted to use this in my project, as the speed and accuracy of recognition seems quite good, and I also will be creating an Android app with the same functionality, so having a single cross platform solution is ideal for me.

ML Kit
https://firebase.google.com/products/ml-kit/
Google 已发布 ML Kit 作为其 Firebase 开发人员工具套件的一部分,在撰写本文时处于测试阶段。与苹果的 CoreML 类似,它是一个机器学习框架,可以使用你自己训练的模型,但也有预训练的模型,用于常见的图像处理任务,如 Vision Framework。Vision Framework不同,这包括一个用于设备上拉丁字符文本识别的模型。目前,该库的使用是免费的用于设备上的功能,使用 Google 的云/SAAS API 产品收费。我选择在我的项目中使用它,因为识别的速度和准确性似乎相当不错,而且我还将创建一个具有相同功能的 Android 应用程序,因此拥有单一的跨平台解决方案对我来说是理想的选择。

ABBYY Real-Time Recognition SDK
https://rtrsdk.com/
This commercial SDK for iOS and Android is free to download for evaluation and limited commercial use (up to 5000 units as of time of writing this post). Further commercial use requires an Extended License. I did not evaluate this offering due to its opaque pricing.

ABBYY 实时识别 SDK
https://rtrsdk.com/
这个适用于 iOS 和 Android 的商业 SDK 可免费下载用于评估和有限的商业用途(截至撰写本文时最多 5000 个单位)。进一步的商业用途需要扩展许可证。由于价格不透明,我没有评估此产品。

回答by zzzel

There's a project similar to that on github: https://github.com/Devxhkl/RealtimeOCR

github上有一个类似的项目:https: //github.com/Devxhkl/RealtimeOCR

回答by Wain

'Real time' is just a set of images. You don't even need to think about processing all of them, just enough to broadly represent the motion of the device (or the change in the camera position). There is nothing built into the iOS SDK to do what you want, but you can use a 3rd party OCR library (like Tesseract) to process the images you grab from the camera.

“实时”只是一组图像。您甚至不需要考虑处理所有这些,只需足以广泛地表示设备的运动(或相机位置的变化)。iOS SDK 中没有内置任何内容来执行您想要的操作,但是您可以使用 3rd 方 OCR 库(如 Tesseract)来处理您从相机中抓取的图像。

回答by nbvikingsidiot001

I would look into Tesseract. It's an open source OCR library that takes image data and processes it. You can add different regular expressions and only look for specific characters as well. It isn't perfect, but from my experience it works pretty well. Also it can be installed as a CocoaPod if you're into that sort of thing.

我会研究Tesseract。它是一个开源 OCR 库,可以获取图像数据并对其进行处理。您可以添加不同的正则表达式,也可以只查找特定字符。它并不完美,但根据我的经验,它运行良好。如果你喜欢这种东西,它也可以作为 CocoaPod 安装。

If you wanted to capture that in real time you might be able to use GPUImageto catch images in the live feed and do processing on the incoming images to speed up Tesseract by using different filters or reducing the size or quality of the incoming images.

如果您想实时捕捉,您可以使用GPUImage捕捉实时馈送中的图像并对传入图像进行处理,以通过使用不同的过滤器或减小传入图像的大小或质量来加速 Tesseract。