Java 图像处理与字符提取
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/20427759/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Image processing and extraction of characters
提问by jkushner
I'm trying to figure out what technologies I would need to process images for characters.
我试图弄清楚处理角色图像需要哪些技术。
Specifically, in this example, I need to extract the hashtag that is circled. You can see it here:
具体来说,在这个例子中,我需要提取被圈起来的主题标签。你可以在这里看到它:
Any implementations would be of great assistance.
任何实现都会有很大帮助。
采纳答案by karlphillip
It is possible to solve this problem with OpenCV+ Tesseract
可以用OpenCV+ Tesseract解决这个问题
though I think there might be easier ways. OpenCVis an open source library used to build computer vision applications and Tesseractis an open source OCR engine.
虽然我认为可能有更简单的方法。OpenCV是一个用于构建计算机视觉应用程序的开源库,而Tesseract是一个开源 OCR 引擎。
Before we start, let me clarify something: that is not a circle, its a rounded rectangle.
在开始之前,让我澄清一下:那不是圆形,而是圆角矩形。
I'm sharing the source code of the application that I wrote to demonstrate how the problem can be solved, as well as some tips on what's going on. This answer is not supposed to educate anybody on digital image processing and it is expected the reader to have a minimal understanding on this field.
我正在分享我编写的应用程序的源代码,以演示如何解决问题,以及一些有关正在发生的事情的提示。这个答案不应该对任何人进行数字图像处理的教育,希望读者对这个领域有一点了解。
I will describe very briefly what the larger sections of the code does. Most of the next chunk of code came from squares.cpp, a sample application that is shipped with OpenCV to detect squares in images.
我将非常简要地描述代码的较大部分的作用。接下来的大部分代码来自squares.cpp,这是 OpenCV 附带的一个示例应用程序,用于检测图像中的方块。
#include <iostream>
#include <vector>
#include <opencv2/highgui/highgui.hpp>
#include <opencv2/imgproc/imgproc.hpp>
// angle: helper function.
// Finds a cosine of angle between vectors from pt0->pt1 and from pt0->pt2.
double angle( cv::Point pt1, cv::Point pt2, cv::Point pt0 )
{
double dx1 = pt1.x - pt0.x;
double dy1 = pt1.y - pt0.y;
double dx2 = pt2.x - pt0.x;
double dy2 = pt2.y - pt0.y;
return (dx1*dx2 + dy1*dy2)/sqrt((dx1*dx1 + dy1*dy1)*(dx2*dx2 + dy2*dy2) + 1e-10);
}
// findSquares: returns sequence of squares detected on the image.
// The sequence is stored in the specified memory storage.
void findSquares(const cv::Mat& image, std::vector<std::vector<cv::Point> >& squares)
{
cv::Mat pyr, timg;
// Down-scale and up-scale the image to filter out small noises
cv::pyrDown(image, pyr, cv::Size(image.cols/2, image.rows/2));
cv::pyrUp(pyr, timg, image.size());
// Apply Canny with a threshold of 50
cv::Canny(timg, timg, 0, 50, 5);
// Dilate canny output to remove potential holes between edge segments
cv::dilate(timg, timg, cv::Mat(), cv::Point(-1,-1));
// find contours and store them all as a list
std::vector<std::vector<cv::Point> > contours;
cv::findContours(timg, contours, CV_RETR_LIST, CV_CHAIN_APPROX_SIMPLE);
for( size_t i = 0; i < contours.size(); i++ ) // Test each contour
{
// Approximate contour with accuracy proportional to the contour perimeter
std::vector<cv::Point> approx;
cv::approxPolyDP(cv::Mat(contours[i]), approx, cv::arcLength(cv::Mat(contours[i]), true)*0.02, true);
// Square contours should have 4 vertices after approximation
// relatively large area (to filter out noisy contours)
// and be convex.
// Note: absolute value of an area is used because
// area may be positive or negative - in accordance with the
// contour orientation
if( approx.size() == 4 &&
fabs(cv::contourArea(cv::Mat(approx))) > 1000 &&
cv::isContourConvex(cv::Mat(approx)) )
{
double maxCosine = 0;
for (int j = 2; j < 5; j++)
{
// Find the maximum cosine of the angle between joint edges
double cosine = fabs(angle(approx[j%4], approx[j-2], approx[j-1]));
maxCosine = MAX(maxCosine, cosine);
}
// If cosines of all angles are small
// (all angles are ~90 degree) then write quandrange
// vertices to resultant sequence
if( maxCosine < 0.3 )
squares.push_back(approx);
}
}
}
// drawSquares: function draws all the squares found in the image
void drawSquares( cv::Mat& image, const std::vector<std::vector<cv::Point> >& squares )
{
for( size_t i = 0; i < squares.size(); i++ )
{
const cv::Point* p = &squares[i][0];
int n = (int)squares[i].size();
cv::polylines(image, &p, &n, 1, true, cv::Scalar(0,255,0), 2, CV_AA);
}
cv::imshow("drawSquares", image);
}
Ok, so our program begins at:
好的,所以我们的程序开始于:
int main(int argc, char* argv[])
{
// Load input image (colored, 3-channel)
cv::Mat input = cv::imread(argv[1]);
if (input.empty())
{
std::cout << "!!! failed imread()" << std::endl;
return -1;
}
// Convert input image to grayscale (1-channel)
cv::Mat grayscale = input.clone();
cv::cvtColor(input, grayscale, cv::COLOR_BGR2GRAY);
//cv::imwrite("gray.png", grayscale);
What grayscalelooks like:
灰度看起来像什么:
// Threshold to binarize the image and get rid of the shoe
cv::Mat binary;
cv::threshold(grayscale, binary, 225, 255, cv::THRESH_BINARY_INV);
cv::imshow("Binary image", binary);
//cv::imwrite("binary.png", binary);
What binarylooks like:
什么二进制看起来像:
// Find the contours in the thresholded image
std::vector<std::vector<cv::Point> > contours;
cv::findContours(binary, contours, cv::RETR_LIST, cv::CHAIN_APPROX_SIMPLE);
// Fill the areas of the contours with BLUE (hoping to erase everything inside a rectangular shape)
cv::Mat blue = input.clone();
for (size_t i = 0; i < contours.size(); i++)
{
std::vector<cv::Point> cnt = contours[i];
double area = cv::contourArea(cv::Mat(cnt));
//std::cout << "* Area: " << area << std::endl;
cv::drawContours(blue, contours, i, cv::Scalar(255, 0, 0),
CV_FILLED, 8, std::vector<cv::Vec4i>(), 0, cv::Point() );
}
cv::imshow("Countours Filled", blue);
//cv::imwrite("contours.png", blue);
What bluelooks like:
什么蓝色的样子:
// Convert the blue colored image to binary (again), and we will have a good rectangular shape to detect
cv::Mat gray;
cv::cvtColor(blue, gray, cv::COLOR_BGR2GRAY);
cv::threshold(gray, binary, 225, 255, cv::THRESH_BINARY_INV);
cv::imshow("binary2", binary);
//cv::imwrite("binary2.png", binary);
What binarylooks like at this point:
此时的二进制文件是什么样的:
// Erode & Dilate to isolate segments connected to nearby areas
int erosion_type = cv::MORPH_RECT;
int erosion_size = 5;
cv::Mat element = cv::getStructuringElement(erosion_type,
cv::Size(2 * erosion_size + 1, 2 * erosion_size + 1),
cv::Point(erosion_size, erosion_size));
cv::erode(binary, binary, element);
cv::dilate(binary, binary, element);
cv::imshow("Morphologic Op", binary);
//cv::imwrite("morpho.png", binary);
What binarylooks like at this point:
此时的二进制文件是什么样的:
// Ok, let's go ahead and try to detect all rectangular shapes
std::vector<std::vector<cv::Point> > squares;
findSquares(binary, squares);
std::cout << "* Rectangular shapes found: " << squares.size() << std::endl;
// Draw all rectangular shapes found
cv::Mat output = input.clone();
drawSquares(output, squares);
//cv::imwrite("output.png", output);
What outputlooks like:
输出是什么样的:
Alright! We solved the first part of the problem which was finding the rounded rectangle. You can see in the image above that the rectangular shape was detected and green lines were drawn over the original image for educational purposes.
好吧!我们解决了问题的第一部分,即找到圆角矩形。您可以在上图中看到检测到矩形形状,并且出于教育目的在原始图像上绘制了绿线。
The second part is much easier. It begins by creating a ROI (Region of Interested) in the original image so we can crop the image to the area inside the rounded rectangle. Once this is done, the cropped image is saved on the disk as a TIFF file, which is then feeded to Tesseract do it's magic:
第二部分要容易得多。它首先在原始图像中创建一个 ROI(感兴趣区域),以便我们可以将图像裁剪到圆角矩形内的区域。完成此操作后,裁剪后的图像将作为 TIFF 文件保存在磁盘上,然后将其提供给 Tesseract,这很神奇:
// Crop the rectangular shape
if (squares.size() == 1)
{
cv::Rect box = cv::boundingRect(cv::Mat(squares[0]));
std::cout << "* The location of the box is x:" << box.x << " y:" << box.y << " " << box.width << "x" << box.height << std::endl;
// Crop the original image to the defined ROI
cv::Mat crop = input(box);
cv::imshow("crop", crop);
//cv::imwrite("cropped.tiff", crop);
}
else
{
std::cout << "* Abort! More than one rectangle was found." << std::endl;
}
// Wait until user presses key
cv::waitKey(0);
return 0;
}
What croplooks like:
什么作物的样子:
When this application finishes it's job, it creates a file named cropped.tiff
on the disk. Go to the command-line and invoke Tesseract to detect the text present on the cropped image:
当这个应用程序完成它的工作时,它会cropped.tiff
在磁盘上创建一个命名的文件。转到命令行并调用 Tesseract 以检测裁剪图像上的文本:
tesseract cropped.tiff out
This command creates a file named out.txt
with the detected text:
此命令创建一个以out.txt
检测到的文本命名的文件:
Tesseract has an API that you can use to add the OCR feature into your application.
Tesseract 有一个 API,您可以使用它来将 OCR 功能添加到您的应用程序中。
This solution is not robust and you will probably have to do some changes here and there to make it work for other test cases.
此解决方案并不健壮,您可能需要在这里和那里进行一些更改才能使其适用于其他测试用例。
回答by Lajos Veres
There is a few alternatives: Java OCR implementation
有几个替代方案:Java OCR 实现
They mention the next tools:
他们提到了下一个工具:
- java ocr http://sourceforge.net/projects/javaocr/
- aspire http://asprise.com/home/
- Java Object Oriented Neural Engine http://www.jooneworld.com/
- Ron Cemer Java OCR http://www.roncemer.com/software-development/java-ocr
- java ocr http://sourceforge.net/projects/javaocr/
- 渴望http://asprise.com/home/
- Java 面向对象神经引擎http://www.jooneworld.com/
- Ron Cemer Java OCR http://www.roncemer.com/software-development/java-ocr
And a few others.
还有其他一些。
This list of links can also be useful: http://www.javawhat.com/showCategory.do?id=2138003
这个链接列表也很有用:http: //www.javawhat.com/showCategory.do?id=2138003
Generally this kind of task requires lots of trial and testing. Probably the best tool depends much more the profile of your input data than anything else.
通常这种任务需要大量的试验和测试。最好的工具可能更多地取决于输入数据的配置文件,而不是其他任何东西。
回答by Ophir Yoktan
OCR works well with scanned document. What you are referring to is text detection in general images, which requires other techniques (sometimes OCR is used as part of the flow)
OCR 适用于扫描的文档。您所指的是一般图像中的文本检测,这需要其他技术(有时使用 OCR 作为流程的一部分)
I'm not aware of any "production ready" implementations.
我不知道任何“生产就绪”的实现。
for general information try google scholar with: "text detection in images"
有关一般信息,请尝试使用谷歌学者:“图像中的文本检测”
a specific method that worked well for me is 'stroke width transform'(SWT) it's not hard to implement, and I believe that there also some implementations available online.
一种对我来说效果很好的特定方法是“笔画宽度变换”(SWT),它不难实现,我相信也有一些在线实现。
回答by Dabo
You can check this article : http://www.codeproject.com/Articles/196168/Contour-Analysis-for-Image-Recognition-in-C
您可以查看这篇文章:http: //www.codeproject.com/Articles/196168/Contour-Analysis-for-Image-Recognition-in-C
It comes with math theory and implementation on C# (unfortunately, but there not that much to rewrite if you decide to implement it in java ) + opencv. So you will have to use Visual Studio and rebuild against your opencv version if you would like to test it, but it worth it.
它带有数学理论和 C# 实现(不幸的是,如果您决定在 java 中实现它,则不需要重写太多)+ opencv。因此,如果您想测试它,您将不得不使用 Visual Studio 并针对您的 opencv 版本进行重建,但这是值得的。