ios 将 Vision VNTextObservation 转换为字符串

Question

提问by Adrian

I'm looking through the Apple's Vision API documentationand I see a couple of classes that relate to text detection in UIImages:

我正在浏览 Apple 的Vision API 文档，我看到了几个与文本检测相关的类UIImages：

1) class VNDetectTextRectanglesRequest

2) class VNTextObservation

It looks like they can detect characters, but I don't see a means to do anything with the characters. Once you've got characters detected, how would you go about turning them into something that can be interpreted by NSLinguisticTagger?

看起来他们可以检测字符，但我没有看到对字符做任何事情的方法。一旦检测到字符，您将如何将它们变成可以被解释的东西NSLinguisticTagger？

Here's a post that is a brief overview of Vision.

这是一篇文章，简要概述了Vision.

Thank you for reading.

感谢您的阅读。

Answer 1

采纳答案by Adrian

Apple finally updated Vision to do OCR. Open a playground and dump a couple of test images in the Resources folder. In my case, I called them "demoDocument.jpg" and "demoLicensePlate.jpg".

苹果终于更新了 Vision 来做 OCR。打开一个 Playground 并在 Resources 文件夹中转储几个测试图像。就我而言，我称它们为“demoDocument.jpg”和“demoLicensePlate.jpg”。

The new class is called VNRecognizeTextRequest. Dump this in a playground and give it a whirl:

新类被称为VNRecognizeTextRequest。把它倒在操场上，试一试：

import Vision

enum DemoImage: String {
    case document = "demoDocument"
    case licensePlate = "demoLicensePlate"
}

class OCRReader {
    func performOCR(on url: URL?, recognitionLevel: VNRequestTextRecognitionLevel)  {
        guard let url = url else { return }
        let requestHandler = VNImageRequestHandler(url: url, options: [:])

        let request = VNRecognizeTextRequest  { (request, error) in
            if let error = error {
                print(error)
                return
            }

            guard let observations = request.results as? [VNRecognizedTextObservation] else { return }

            for currentObservation in observations {
                let topCandidate = currentObservation.topCandidates(1)
                if let recognizedText = topCandidate.first {
                    print(recognizedText.string)
                }
            }
        }
        request.recognitionLevel = recognitionLevel

        try? requestHandler.perform([request])
    }
}

func url(for image: DemoImage) -> URL? {
    return Bundle.main.url(forResource: image.rawValue, withExtension: "jpg")
}

let ocrReader = OCRReader()
ocrReader.performOCR(on: url(for: .document), recognitionLevel: .fast)

There's an in-depth discussionof this from WWDC19

这里有一个深入的讨论从WWDC19这

Answer 2

回答by brian.clear

SwiftOCR

快速OCR

I just got SwiftOCR to work with small sets of text.

我刚刚让 SwiftOCR 处理少量文本。

https://github.com/garnele007/SwiftOCR

uses

用途

https://github.com/Swift-AI/Swift-AI

which uses NeuralNet-MNIST model for text recognition.

它使用 NeuralNet-MNIST 模型进行文本识别。

TODO : VNTextObservation > SwiftOCR

Will post example of it using VNTextObservation once I have it one connected to the other.

一旦我将它连接到另一个，将使用 VNTextObservation 发布它的示例。

OpenCV + Tesseract OCR

I tried to use OpenCV + Tesseract but got compile errors then found SwiftOCR.

我尝试使用 OpenCV + Tesseract 但遇到编译错误然后找到了 SwiftOCR。

SEE ALSO : Google Vision iOS

还请参见：Google Vision iOS

Note Google Vision Text Recognition - Android sdk has text detection but also has iOS cocoapod. So keep an eye on it as should add text recognition to the iOS eventually.

注意 Google Vision 文本识别 - Android sdk 具有文本检测功能，但也具有 iOS cocoapod。因此，请密切关注它，因为最终应将文本识别添加到 iOS。

https://developers.google.com/vision/text-overview

//Correction: just tried it but only Android version of the sdk supports text detection.

//更正：刚试过，但只有Android版本的sdk支持文本检测。

https://developers.google.com/vision/text-overview

If you subscribe to releases: https://libraries.io/cocoapods/GoogleMobileVision

如果您订阅版本：https: //libraries.io/cocoapods/GoogleMobileVision

Click SUBSCRIBE TO RELEASES you can see when TextDetection is added to the iOS part of the Cocoapod

点击 SUBSCRIBE TO RELEASES 你可以看到 TextDetection 添加到 Cocoapod 的 iOS 部分时

Answer 3

回答by DrNeurosurg

This is how to do it ...

这是如何做到的...

    //
//  ViewController.swift
//


import UIKit
import Vision
import CoreML

class ViewController: UIViewController {

    //HOLDS OUR INPUT
    var  inputImage:CIImage?

    //RESULT FROM OVERALL RECOGNITION
    var  recognizedWords:[String] = [String]()

    //RESULT FROM RECOGNITION
    var recognizedRegion:String = String()


    //OCR-REQUEST
    lazy var ocrRequest: VNCoreMLRequest = {
        do {
            //THIS MODEL IS TRAINED BY ME FOR FONT "Inconsolata" (Numbers 0...9 and UpperCase Characters A..Z)
            let model = try VNCoreMLModel(for:OCR().model)
            return VNCoreMLRequest(model: model, completionHandler: self.handleClassification)
        } catch {
            fatalError("cannot load model")
        }
    }()

    //OCR-HANDLER
    func handleClassification(request: VNRequest, error: Error?)
    {
        guard let observations = request.results as? [VNClassificationObservation]
            else {fatalError("unexpected result") }
        guard let best = observations.first
            else { fatalError("cant get best result")}

        self.recognizedRegion = self.recognizedRegion.appending(best.identifier)
    }

    //TEXT-DETECTION-REQUEST
    lazy var textDetectionRequest: VNDetectTextRectanglesRequest = {
        return VNDetectTextRectanglesRequest(completionHandler: self.handleDetection)
    }()

    //TEXT-DETECTION-HANDLER
    func handleDetection(request:VNRequest, error: Error?)
    {
        guard let observations = request.results as? [VNTextObservation]
            else {fatalError("unexpected result") }

       // EMPTY THE RESULTS
        self.recognizedWords = [String]()

        //NEEDED BECAUSE OF DIFFERENT SCALES
        let  transform = CGAffineTransform.identity.scaledBy(x: (self.inputImage?.extent.size.width)!, y:  (self.inputImage?.extent.size.height)!)

        //A REGION IS LIKE A "WORD"
        for region:VNTextObservation in observations
        {
            guard let boxesIn = region.characterBoxes else {
                continue
            }

            //EMPTY THE RESULT FOR REGION
            self.recognizedRegion = ""

            //A "BOX" IS THE POSITION IN THE ORIGINAL IMAGE (SCALED FROM 0... 1.0)
            for box in boxesIn
            {
                //SCALE THE BOUNDING BOX TO PIXELS
                let realBoundingBox = box.boundingBox.applying(transform)

                //TO BE SURE
                guard (inputImage?.extent.contains(realBoundingBox))!
                    else { print("invalid detected rectangle"); return}

                //SCALE THE POINTS TO PIXELS
                let topleft = box.topLeft.applying(transform)
                let topright = box.topRight.applying(transform)
                let bottomleft = box.bottomLeft.applying(transform)
                let bottomright = box.bottomRight.applying(transform)

                //LET'S CROP AND RECTIFY
                let charImage = inputImage?
                    .cropped(to: realBoundingBox)
                    .applyingFilter("CIPerspectiveCorrection", parameters: [
                        "inputTopLeft" : CIVector(cgPoint: topleft),
                        "inputTopRight" : CIVector(cgPoint: topright),
                        "inputBottomLeft" : CIVector(cgPoint: bottomleft),
                        "inputBottomRight" : CIVector(cgPoint: bottomright)
                        ])

                //PREPARE THE HANDLER
                let handler = VNImageRequestHandler(ciImage: charImage!, options: [:])

                //SOME OPTIONS (TO PLAY WITH..)
                self.ocrRequest.imageCropAndScaleOption = VNImageCropAndScaleOption.scaleFill

                //FEED THE CHAR-IMAGE TO OUR OCR-REQUEST - NO NEED TO SCALE IT - VISION WILL DO IT FOR US !!
                do {
                    try handler.perform([self.ocrRequest])
                }  catch { print("Error")}

            }

            //APPEND RECOGNIZED CHARS FOR THAT REGION
            self.recognizedWords.append(recognizedRegion)
        }

        //THATS WHAT WE WANT - PRINT WORDS TO CONSOLE
        DispatchQueue.main.async {
            self.PrintWords(words: self.recognizedWords)
        }
    }

    func PrintWords(words:[String])
    {
        // VOILA'
        print(recognizedWords)

    }

    func doOCR(ciImage:CIImage)
    {
        //PREPARE THE HANDLER
        let handler = VNImageRequestHandler(ciImage: ciImage, options:[:])

        //WE NEED A BOX FOR EACH DETECTED CHARACTER
        self.textDetectionRequest.reportCharacterBoxes = true
        self.textDetectionRequest.preferBackgroundProcessing = false

        //FEED IT TO THE QUEUE FOR TEXT-DETECTION
        DispatchQueue.global(qos: .userInteractive).async {
            do {
                try  handler.perform([self.textDetectionRequest])
            } catch {
                print ("Error")
            }
        }

    }

    override func viewDidLoad() {
        super.viewDidLoad()
        // Do any additional setup after loading the view, typically from a nib.

        //LETS LOAD AN IMAGE FROM RESOURCE
        let loadedImage:UIImage = UIImage(named: "Sample1.png")! //TRY Sample2, Sample3 too

        //WE NEED A CIIMAGE - NOT NEEDED TO SCALE
        inputImage = CIImage(image:loadedImage)!

        //LET'S DO IT
        self.doOCR(ciImage: inputImage!)


    }

    override func didReceiveMemoryWarning() {
        super.didReceiveMemoryWarning()
        // Dispose of any resources that can be recreated.
    }
}

You'll find the complete project hereincluded is the trained model !

你会发现这里包含的完整项目是经过训练的模型！

Answer 4

回答by Dimillian

Adding my own progress on this, if anyone have a better solution:

如果有人有更好的解决方案，请添加我自己的进度：

I've successfully drawn the region box and character boxes on screen. The vision API of Apple is actually very performant. You have to transform each frame of your video to an image and feed it to the recogniser. It's much more accurate than feeding directly the pixel buffer from the camera.

我已经成功地在屏幕上绘制了区域框和字符框。Apple 的视觉 API 实际上非常高效。您必须将视频的每一帧转换为图像并将其提供给识别器。它比直接从相机提供像素缓冲区要准确得多。

 if #available(iOS 11.0, *) {
            guard let pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else {return}

            var requestOptions:[VNImageOption : Any] = [:]

            if let camData = CMGetAttachment(sampleBuffer, kCMSampleBufferAttachmentKey_CameraIntrinsicMatrix, nil) {
                requestOptions = [.cameraIntrinsics:camData]
            }

            let imageRequestHandler = VNImageRequestHandler(cvPixelBuffer: pixelBuffer,
                                                            orientation: 6,
                                                            options: requestOptions)

            let request = VNDetectTextRectanglesRequest(completionHandler: { (request, _) in
                guard let observations = request.results else {print("no result"); return}
                let result = observations.map({- (void)detectWithImageURL:(NSURL *)URL
{
    VNImageRequestHandler *handler = [[VNImageRequestHandler alloc] initWithURL:URL options:@{}];
    VNDetectTextRectanglesRequest *request = [[VNDetectTextRectanglesRequest alloc] initWithCompletionHandler:^(VNRequest * _Nonnull request, NSError * _Nullable error) {
        if (error) {
            NSLog(@"%@", error);
        }
        else {
            for (VNTextObservation *textObservation in request.results) {
//                NSLog(@"%@", textObservation);
//                NSLog(@"%@", textObservation.characterBoxes);
                NSLog(@"%@", NSStringFromCGRect(textObservation.boundingBox));
                for (VNRectangleObservation *rectangleObservation in textObservation.characterBoxes) {
                    NSLog(@" |-%@", NSStringFromCGRect(rectangleObservation.boundingBox));
                }
            }
        }
    }];
    request.reportCharacterBoxes = YES;
    NSError *error;
    [handler performRequests:@[request] error:&error];
    if (error) {
        NSLog(@"%@", error);
    }
}
 as? VNTextObservation})
                DispatchQueue.main.async {
                    self.previewLayer.sublayers?.removeSubrange(1...)
                    for region in result {
                        guard let rg = region else {continue}
                        self.drawRegionBox(box: rg)
                        if let boxes = region?.characterBoxes {
                            for characterBox in boxes {
                                self.drawTextBox(box: characterBox)
                            }
                        }
                    }
                }
            })
            request.reportCharacterBoxes = true
            try? imageRequestHandler.perform([request])
        }
    }

Now I'm trying to actually reconize the text. Apple doesn't provide any built in OCR model. And I want to use CoreML to do that, so I'm trying to convert a Tesseract trained data model to CoreML.

现在我正在尝试重新整理文本。Apple 不提供任何内置的 OCR 模型。我想使用 CoreML 来做到这一点，所以我试图将 Tesseract 训练的数据模型转换为 CoreML。

You can find Tesseract models here: https://github.com/tesseract-ocr/tessdataand I think the next step is to write a coremltools converter that support those type of input and output a .coreML file.

您可以在此处找到 Tesseract 模型：https: //github.com/tesseract-ocr/tessdata，我认为下一步是编写支持这些类型的输入和输出 .coreML 文件的 coremltools 转换器。

Or, you can link to TesseractiOS directly and try to feed it with your region boxes and character boxes you get from the Vision API.

或者，您可以直接链接到 TesseractiOS，并尝试使用您从 Vision API 获得的区域框和字符框来提供它。

Answer 5

回答by nathan

Thanks to a GitHub user, you can test an example: https://gist.github.com/Koze/e59fa3098388265e578dee6b3ce89dd8

感谢 GitHub 用户，您可以测试一个示例：https: //gist.github.com/Koze/e59fa3098388265e578dee6b3ce89dd8

func sliceaAndOCR(image: UIImage, charWhitelist: String, charBlackList: String = "", completion: @escaping ((_: String, _: UIImage) -> Void))

The thing is, the result is an array of bounding boxes for each detected character. From what I gathered from Vision's session, I think you are supposed to use CoreML to detect the actual chars.

问题是，结果是每个检测到的字符的边界框数组。从我从 Vision 的会议中收集的信息来看，我认为您应该使用 CoreML 来检测实际的字符。

Recommended WWDC 2017 talk: Vision Framework: Building on Core ML(haven't finished watching it either), have a look at 25:50 for a similar example called MNISTVision

推荐的 WWDC 2017 演讲：Vision Framework：Building on Core ML（也没看完），看看 25:50 有一个类似的例子叫做 MNISTVision

Here's another nifty app demonstrating the use of Keras (Tensorflow) for the training of a MNIST model for handwriting recognition using CoreML: Github

这是另一个漂亮的应用程序，演示了使用 Keras (Tensorflow) 训练使用 CoreML 进行手写识别的 MNIST 模型：Github

Answer 6

回答by Foti Dim

Firebase ML Kit does it for iOS (and Android) with their on-device Vision APIand it outperforms Tesseract and SwiftOCR.

Firebase ML Kit 使用其设备上的Vision API为 iOS（和 Android）执行此操作，并且它的性能优于 Tesseract 和 SwiftOCR。

Answer 7

回答by Andre Guerra

I'm using Google's Tesseract OCR engine to convert the images into actual strings. You'll have to add it to your Xcode project using cocoapods. Although Tesseract will perform OCR even if you simply feed the image containing texts to it, the way to make it perform better/faster is to use the detected text rectangles to feed pieces of the image that actually contain text, which is where Apple's Vision Framework comes in handy. Here's a link to the engine: Tesseract OCRAnd here's a link to the current stage of my project that has text detection + OCR already implemented: Out Loud - Camera to SpeechHope these can be of some use. Good luck!

我正在使用 Google 的 Tesseract OCR 引擎将图像转换为实际字符串。您必须使用 cocoapods 将其添加到您的 Xcode 项目中。尽管 Tesseract 会执行 OCR，即使您只是将包含文本的图像提供给它，但使其性能更好/更快的方法是使用检测到的文本矩形来提供实际包含文本的图像片段，这就是 Apple 的 Vision Framework派上用场了。这是引擎的链接： Tesseract OCR这是我项目的当前阶段的链接，该项目已经实施了文本检测 + OCR： Out Loud - Camera to Speech希望这些可以有用。祝你好运！

Answer 8

回答by Roberto Ferraz

For those still looking for a solution I wrote a quick libraryto do this. It uses both the Vision API and Tesseract and can be used to achieve the task the question describes with one single method:

对于那些仍在寻找解决方案的人，我编写了一个快速库来做到这一点。它同时使用 Vision API 和 Tesseract，可用于通过一种方法完成问题描述的任务：

##代码##

This method will look for text in your image, return the string found and a slice of the original image showing where the text was found

此方法将在您的图像中查找文本，返回找到的字符串和显示文本所在位置的原始图像切片

ios 将 Vision VNTextObservation 转换为字符串

提问by Adrian

采纳答案by Adrian

回答by brian.clear

回答by DrNeurosurg

回答by Dimillian

回答by nathan

回答by Foti Dim

回答by Andre Guerra

回答by Roberto Ferraz

相关推荐

最近更新

标签

ios 将 Vision VNTextObservation 转换为字符串

提问by Adrian

采纳答案by Adrian

回答by brian.clear

回答by DrNeurosurg

回答by Dimillian

回答by nathan

回答by Foti Dim

回答by Andre Guerra

回答by Roberto Ferraz

相关推荐

ios 打印：条目，“：CFBundleIdentifier”，不存在

在 iOS 11 上使用 Xcode 8

ios 如何检查IPhone的IPv6地址

如何在 iOS 11、Apple TV 4K 等的 Xcode 9 中进行无线调试？

相关推荐

最近更新

标签