C++ 图像处理:“可口可乐罐”识别的算法改进
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/10168686/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Image Processing: Algorithm Improvement for 'Coca-Cola Can' Recognition
提问by Charles Menguy
One of the most interesting projects I've worked on in the past couple of years was a project about image processing. The goal was to develop a system to be able to recognize Coca-Cola 'cans'(note that I'm stressing the word 'cans', you'll see why in a minute). You can see a sample below, with the can recognized in the green rectanglewith scale and rotation.
在过去的几年里,我参与过的最有趣的项目之一是关于图像处理的项目。目标是开发一个能够识别可口可乐“罐”的系统(请注意,我在强调“罐”这个词,你马上就会明白为什么)。您可以在下面看到一个示例,在带有缩放和旋转的绿色矩形中识别出罐头。
Some constraints on the project:
对项目的一些限制:
- The background could be very noisy.
- The cancould have any scaleor rotationor even orientation (within reasonable limits).
- The image could have some degree of fuzziness (contours might not be entirely straight).
- There could be Coca-Cola bottles in the image, and the algorithm should only detect the can!
- The brightness of the image could vary a lot (so you can't rely "too much" on color detection).
- The cancould be partly hidden on the sides or the middle and possibly partly hidden behind a bottle.
- There could be no canat all in the image, in which case you had to find nothing and write a message saying so.
- 背景可能非常嘈杂。
- 罐子可以有任何比例或旋转甚至方向(在合理的范围内)。
- 图像可能有一定程度的模糊性(轮廓可能不完全是直的)。
- 图像中可能有可口可乐瓶,算法应该只检测罐!
- 图像的亮度可能会有很大差异(因此您不能“过分”依赖颜色检测)。
- 该罐可以部分地隐藏在两侧或中间,可能部分地隐藏了一瓶后面。
- 图像中可能根本没有罐头,在这种情况下,您必须一无所获并写下这样的消息。
So you could end up with tricky things like this (which in this case had my algorithm totally fail):
所以你最终可能会遇到这样的棘手事情(在这种情况下,我的算法完全失败了):
I did this project a while ago, and had a lot of fun doing it, and I had a decent implementation. Here are some details about my implementation:
不久前我做了这个项目,并且从中获得了很多乐趣,而且我有一个不错的实现。以下是有关我的实现的一些详细信息:
Language: Done in C++ using OpenCVlibrary.
语言:使用OpenCV库在 C++ 中完成。
Pre-processing: For the image pre-processing, i.e. transforming the image into a more raw form to give to the algorithm, I used 2 methods:
预处理:对于图像预处理,即将图像转换为更原始的形式以提供给算法,我使用了两种方法:
- Changing color domain from RGB to HSVand filtering based on "red" hue, saturation above a certain threshold to avoid orange-like colors, and filtering of low value to avoid dark tones. The end result was a binary black and white image, where all white pixels would represent the pixels that match this threshold. Obviously there is still a lot of crap in the image, but this reduces the number of dimensions you have to work with.
- Noise filtering using median filtering (taking the median pixel value of all neighbors and replace the pixel by this value) to reduce noise.
- Using Canny Edge Detection Filterto get the contours of all items after 2 precedent steps.
- 将色域从 RGB 更改为HSV并基于“红色”色调进行过滤,饱和度高于某个阈值以避免类似橙色的颜色,以及过滤低值以避免暗色调。最终结果是一个二进制黑白图像,其中所有白色像素将代表与此阈值匹配的像素。显然,图像中仍然有很多废话,但这减少了您必须使用的维度数量。
- 噪声过滤使用中值滤波(取所有邻居的中值像素值并用该值替换像素)来减少噪声。
- 使用Canny 边缘检测过滤器在 2 个先行步骤后获得所有项目的轮廓。
Algorithm: The algorithm itself I chose for this task was taken from thisawesome book on feature extraction and called Generalized Hough Transform(pretty different from the regular Hough Transform). It basically says a few things:
算法:我为这个任务选择的算法本身取自这本关于特征提取的很棒的书,称为广义霍夫变换(与常规霍夫变换非常不同)。它基本上说了几件事:
- You can describe an object in space without knowing its analytical equation (which is the case here).
- It is resistant to image deformations such as scaling and rotation, as it will basically test your image for every combination of scale factor and rotation factor.
- It uses a base model (a template) that the algorithm will "learn".
- Each pixel remaining in the contour image will vote for another pixel which will supposedly be the center (in terms of gravity) of your object, based on what it learned from the model.
- 您可以在不知道解析方程的情况下描述空间中的物体(这里就是这种情况)。
- 它可以抵抗缩放和旋转等图像变形,因为它基本上会针对缩放因子和旋转因子的每种组合测试您的图像。
- 它使用算法将“学习”的基本模型(模板)。
- 轮廓图像中剩余的每个像素都将根据它从模型中学到的知识为另一个像素投票,该像素应该是对象的中心(就重力而言)。
In the end, you end up with a heat map of the votes, for example here all the pixels of the contour of the can will vote for its gravitational center, so you'll have a lot of votes in the same pixel corresponding to the center, and will see a peak in the heat map as below:
最后,你会得到一张选票的热图,例如这里罐子轮廓的所有像素都会投票给它的重心,所以你会在同一个像素对应的中心,并将在热图中看到一个峰值,如下所示:
Once you have that, a simple threshold-based heuristic can give you the location of the center pixel, from which you can derive the scale and rotation and then plot your little rectangle around it (final scale and rotation factor will obviously be relative to your original template). In theory at least...
一旦你有了它,一个简单的基于阈值的启发式可以给你中心像素的位置,从中你可以得出比例和旋转,然后在它周围绘制你的小矩形(最终的比例和旋转因子显然与你的原始模板)。理论上至少...
Results: Now, while this approach worked in the basic cases, it was severely lacking in some areas:
结果:现在,虽然这种方法在基本情况下有效,但在某些领域严重缺乏:
- It is extremely slow! I'm not stressing this enough. Almost a full day was needed to process the 30 test images, obviously because I had a very high scaling factor for rotation and translation, since some of the cans were very small.
- It was completely lost when bottles were in the image, and for some reason almost always found the bottle instead of the can (perhaps because bottles were bigger, thus had more pixels, thus more votes)
- Fuzzy images were also no good, since the votes ended up in pixel at random locations around the center, thus ending with a very noisy heat map.
- In-variance in translation and rotation was achieved, but not in orientation, meaning that a can that was not directly facing the camera objective wasn't recognized.
- 它非常慢!我强调的还不够多。处理这 30 个测试图像几乎需要一整天的时间,这显然是因为我的旋转和平移缩放因子非常高,因为有些罐子非常小。
- 当瓶子出现在图像中时它完全丢失了,并且由于某种原因几乎总是找到瓶子而不是罐子(可能是因为瓶子更大,因此有更多的像素,因此有更多的票)
- 模糊图像也不好,因为投票以中心周围随机位置的像素结束,因此以非常嘈杂的热图结束。
- 实现了平移和旋转的不变性,但在方向上没有实现,这意味着没有识别出不直接面向相机目标的罐子。
Can you help me improve my specificalgorithm, using exclusively OpenCVfeatures, to resolve the four specificissues mentioned?
你能帮助我改进我的特定算法,专门使用OpenCV特性来解决提到的四个特定问题吗?
I hope some people will also learn something out of it as well, after all I think not only people who ask questions should learn. :)
我也希望有的人也能从中学到一些东西,毕竟我觉得不是只有提问的人才会学。:)
采纳答案by stacker
An alternative approach would be to extract features (keypoints) using the scale-invariant feature transform(SIFT) or Speeded Up Robust Features(SURF).
另一种方法是使用尺度不变特征变换(SIFT) 或加速鲁棒特征(SURF)提取特征(关键点)。
It is implemented in OpenCV2.3.1.
它在OpenCV2.3.1 中实现。
You can find a nice code example using features in Features2D + Homography to find a known object
您可以使用Features2D + Homography中的功能找到一个很好的代码示例来查找已知对象
Both algorithms are invariant to scaling and rotation. Since they work with features, you can also handle occlusion(as long as enough keypoints are visible).
这两种算法对于缩放和旋转都是不变的。由于它们与特征一起工作,因此您还可以处理遮挡(只要有足够多的关键点可见)。
Image source: tutorial example
图片来源:教程示例
The processing takes a few hundred ms for SIFT, SURF is bit faster, but it not suitable for real-time applications. ORB uses FAST which is weaker regarding rotation invariance.
SIFT 的处理需要几百毫秒,SURF 更快一些,但不适合实时应用。ORB 使用 FAST,它在旋转不变性方面较弱。
The original papers
原始论文
回答by kmote
To speed things up, I would take advantage of the fact that you are not asked to find an arbitrary image/object, but specifically one with the Coca-Cola logo. This is significant because this logo is very distinctive, and it should have a characteristic, scale-invariant signature in the frequency domain, particularly in the red channel of RGB. That is to say, the alternating pattern of red-to-white-to-red encountered by a horizontal scan line (trained on a horizontally aligned logo) will have a distinctive "rhythm" as it passes through the central axis of the logo. That rhythm will "speed up" or "slow down" at different scales and orientations, but will remain proportionally equivalent. You could identify/define a few dozen such scanlines, both horizontally and vertically through the logo and several more diagonally, in a starburst pattern. Call these the "signature scan lines."
为了加快速度,我会利用这样一个事实,即您不会被要求查找任意图像/对象,而是特别是带有可口可乐徽标的图像/对象。这很重要,因为这个标志非常独特,它应该在频域中具有特征性的、尺度不变的特征,特别是在 RGB 的红色通道中。也就是说,水平扫描线(在水平对齐的标志上训练)遇到的红色到白色到红色的交替图案在穿过标志的中心轴时将具有独特的“节奏”。这种节奏会在不同的尺度和方向上“加速”或“减速”,但在比例上会保持相同。你可以识别/定义几十条这样的扫描线,水平和垂直穿过标志,还有几个对角线,在星爆模式。称这些为“签名扫描线”。
Searching for this signature in the target image is a simple matter of scanning the image in horizontal strips. Look for a high-frequency in the red-channel (indicating moving from a red region to a white one), and once found, see if it is followed by one of the frequency rhythms identified in the training session. Once a match is found, you will instantly know the scan-line's orientation and location in the logo (if you keep track of those things during training), so identifying the boundaries of the logo from there is trivial.
在目标图像中搜索此签名很简单,只需扫描水平条带中的图像即可。在红色通道中寻找高频(表示从红色区域移动到白色区域),一旦找到,看看它是否跟随训练课程中确定的频率节奏之一。一旦找到匹配项,您将立即知道徽标中扫描线的方向和位置(如果您在训练期间跟踪这些内容),因此从那里识别徽标的边界是微不足道的。
I would be surprised if this weren't a linearly-efficient algorithm, or nearly so. It obviously doesn't address your can-bottle discrimination, but at least you'll have your logos.
如果这不是一种线性有效的算法,或者几乎如此,我会感到惊讶。它显然没有解决你对罐头瓶的歧视,但至少你会有你的标志。
(Update: for bottle recognition I would look for coke (the brown liquid) adjacent to the logo -- that is, insidethe bottle. Or, in the case of an empty bottle, I would look for a capwhich will always have the same basic shape, size, and distance from the logo and will typically be all white or red. Search for a solid color eliptical shape where a cap shouldbe, relative to the logo. Not foolproof of course, but your goal here should be to find the easyones fast.)
(更新:气瓶承认我会找焦(棕色液体)相邻的标志-那就是里面。瓶子或者说,在一个空瓶子的情况下,我会找一个上限,这将始终有相同的基本形状、大小和与徽标的距离,通常都是白色或红色。搜索一个纯色椭圆形状,相对于徽标,帽子应该在那里。当然不是万无一失,但你在这里的目标应该是快速找到简单的。)
(It's been a few years since my image processing days, so I kept this suggestion high-level and conceptual. I think it might slightly approximate how a human eye might operate -- or at least how my brain does!)
(自从我的图像处理时代以来已经有几年了,所以我保留了这个建议的高级和概念性。我认为它可能有点接近人眼的运作方式——或者至少是我的大脑的运作方式!)
回答by Darren Cook
Fun problem: when I glanced at your bottle image I thought it was a can too. But, as a human, what I did to tell the difference is that I then noticed it was also a bottle...
有趣的问题:当我瞥了一眼你的瓶子图片时,我还以为它也是一个罐头。但是,作为一个人,我所做的就是区分不同之处,然后我注意到它也是一个瓶子......
So, to tell cans and bottles apart, how about simply scanning for bottles first? If you find one, mask out the label before looking for cans.
那么,要区分罐头和瓶子,先简单地扫描瓶子怎么样?如果您找到了,请在寻找罐头之前掩盖标签。
Not too hard to implement if you're already doing cans. The real downside is it doubles your processing time. (But thinking ahead to real-world applications, you're going to end up wanting to do bottles anyway ;-)
如果您已经在做罐头,那么实施起来并不难。真正的缺点是它使您的处理时间加倍。(但是考虑到现实世界的应用程序,无论如何你最终都会想要做瓶子;-)
回答by Abid Rahman K
Isn't it difficult even for humans to distinguish between a bottle and a can in the second image (provided the transparent region of the bottle is hidden)?
在第二张图片中,即使是人类也很难区分瓶子和罐子(假设瓶子的透明区域是隐藏的)?
They are almost the same except for a very small region (that is, width at the top of the can is a little small while the wrapper of the bottle is the same width throughout, but a minor change right?)
除了很小的区域外,它们几乎相同(也就是说,罐头顶部的宽度有点小,而瓶子的包装纸始终宽度相同,但有细微的变化,对吗?)
The first thing that came to my mind was to check for the red top of bottle. But it is still a problem, if there is no top for the bottle, or if it is partially hidden (as mentioned above).
我想到的第一件事是检查瓶子的红色顶部。但是如果瓶子没有顶部,或者它被部分隐藏(如上所述),它仍然是一个问题。
The second thing I thought was about the transparency of bottle. OpenCV has some works on finding transparent objects in an image. Check the below links.
我想到的第二件事是关于瓶子的透明度。OpenCV 有一些在图像中寻找透明对象的工作。检查以下链接。
Particularly look at this to see how accurately they detect glass:
特别是看看这个,看看他们检测玻璃的准确度:
See their implementation result:
看他们的实现结果:
They say it is the implementation of the paper "A Geodesic Active Contour Framework for Finding Glass" by K. McHenry and J. Ponce, CVPR 2006.
他们说这是K. McHenry 和 J. Ponce,CVPR 2006的论文“用于寻找玻璃的测地线活动轮廓框架”的实现。
It might be helpful in your case a little bit, but problem arises again if the bottle is filled.
这可能对您的情况有所帮助,但如果瓶子装满,问题就会再次出现。
So I think here, you can search for the transparent body of the bottles first or for a red region connected to two transparent objects laterally which is obviously the bottle. (When working ideally, an image as follows.)
所以我想在这里,你可以先搜索瓶子的透明体,或者横向连接两个透明物体的红色区域,这显然是瓶子。(理想情况下,图像如下。)
Now you can remove the yellow region, that is, the label of the bottle and run your algorithm to find the can.
现在您可以移除黄色区域,即瓶子的标签并运行您的算法来找到罐子。
Anyway, this solution also has different problems like in the other solutions.
无论如何,这个解决方案也有与其他解决方案不同的问题。
- It works only if your bottle is empty. In that case, you will have to search for the red region between the two black colors (if the Coca Cola liquid is black).
- Another problem if transparent part is covered.
- 只有当你的瓶子是空的时候它才有效。在这种情况下,您将不得不搜索两种黑色之间的红色区域(如果可口可乐液体是黑色的)。
- 如果透明部分被覆盖,则另一个问题。
But anyway, if there are none of the above problems in the pictures, this seems be to a better way.
但无论如何,如果图片中没有上述问题,这似乎是一个更好的方法。
回答by MrGomez
I really like Darren Cook'sand stacker's answersto this problem. I was in the midst of throwing my thoughts into a comment on those, but I believe my approach is too answer-shaped to not leave here.
我真的很喜欢Darren Cook和stacker对这个问题的回答。我正在将我的想法投入到对这些的评论中,但我相信我的方法太过于回答问题而不能离开这里。
In short summary, you've identified an algorithm to determine that a Coca-Cola logo is present at a particular location in space. You're now trying to determine, for arbitrary orientations and arbitrary scaling factors, a heuristic suitable for distinguishing Coca-Cola cansfrom other objects, inclusive of: bottles, billboards, advertisements, and Coca-Cola paraphernaliaall associated with this iconic logo. You didn't call out many of these additional cases in your problem statement, but I feel they're vital to the success of your algorithm.
简而言之,您已经确定了一种算法来确定可口可乐徽标出现在空间中的特定位置。您现在正在尝试针对任意方向和任意缩放因子确定一种适用于区分可口可乐罐与其他对象的启发式方法,包括:瓶子、广告牌、广告和可口可乐用具,所有这些都与此标志性徽标相关联。你没有在你的问题陈述中提到许多这些额外的案例,但我觉得它们对你的算法的成功至关重要。
The secret here is determining what visual features a cancontains or, through the negative space, what features are present for other Coke products that are not present for cans. To that end, the current top answersketches out a basic approach for selecting "can" if and only if "bottle" is not identified, either by the presence of a bottle cap, liquid, or other similar visual heuristics.
这里的秘密是确定罐头包含哪些视觉特征,或者通过负空间确定罐头不存在的其他可乐产品的哪些特征。为此,当前的最佳答案勾勒出选择“罐”的基本方法,当且仅当“瓶子”未被识别时,无论是通过瓶盖、液体还是其他类似的视觉启发式。
The problem is this breaks down. A bottle could, for example, be empty and lack the presence of a cap, leading to a false positive. Or, it could be a partial bottlewith additional features mangled, leading again to false detection. Needless to say, this isn't elegant, nor is it effective for our purposes.
问题是这会崩溃。例如,瓶子可能是空的并且没有盖子,从而导致误报。或者,它可能是一个附加功能受损的部分瓶子,再次导致错误检测。毋庸置疑,这并不优雅,对我们的目的也无效。
To this end, the most correct selection criteria for cans appear to be the following:
为此,罐头最正确的选择标准如下:
- Is the shape of the object silhouette, as you sketched out in your question, correct? If so, +1.
- If we assume the presence of natural or artificial light, do we detect a chrome outline to the bottle that signifies whether this is made of aluminum? If so, +1.
- Do we determine that the specular propertiesof the object are correct, relative to our light sources (illustrative video linkon light source detection)? If so, +1.
- Can we determine any other properties about the object that identify it as a can, including, but not limited to, the topological image skew of the logo, the orientation of the object, the juxtaposition of the object (for example, on a planar surface like a table or in the context of other cans), and the presence of a pull tab? If so, for each, +1.
- 您在问题中勾勒出的对象轮廓的形状是否正确?如果是这样,+1。
- 如果我们假设存在自然光或人造光,我们是否会检测到瓶子上的铬轮廓,表明它是否由铝制成?如果是这样,+1。
- 我们是否确定物体的镜面反射属性相对于我们的光源是正确的(关于光源检测的说明性视频链接)?如果是这样,+1。
- 我们能否确定物体的任何其他将其识别为罐头的属性,包括但不限于徽标的拓扑图像倾斜、物体的方向、物体的并置(例如,在平面上)像一张桌子或在其他罐头的上下文中),以及拉片的存在?如果是这样,对于每个,+1。
Your classification might then look like the following:
您的分类可能如下所示:
- For each candidate match, if the presence of a Coca Cola logo was detected, draw a gray border.
- For each match over +2, draw a red border.
- 对于每个候选匹配项,如果检测到可口可乐徽标的存在,则绘制灰色边框。
- 对于每场超过 +2 的比赛,画一个红色边框。
This visually highlights to the user what was detected, emphasizing weak positives that may, correctly, be detected as mangled cans.
这在视觉上向用户突出显示了检测到的内容,强调了可以正确检测为破损罐头的弱阳性。
The detection of each property carries a very different time and space complexity, and for each approach, a quick pass through http://dsp.stackexchange.comis more than reasonable for determining the most correct and most efficient algorithm for your purposes. My intent here is, purely and simply, to emphasize that detecting if something is a can by invalidating a small portion of the candidate detection spaceisn't the most robust or effective solution to this problem, and ideally, you should take the appropriate actions accordingly.
每个属性的检测都具有非常不同的时间和空间复杂度,对于每种方法,快速通过http://dsp.stackexchange.com来确定最正确和最有效的算法是非常合理的。我在这里的意图纯粹而简单地强调,通过使一小部分候选检测空间无效来检测某物是否是罐头并不是解决此问题的最稳健或最有效的解决方案,理想情况下,您应该采取适当的措施因此。
And hey, congrats on the Hacker News posting!On the whole, this is a pretty terrific question worthy of the publicity it received. :)
嘿,恭喜Hacker News 发布!总的来说,这是一个非常棒的问题,值得它受到的宣传。:)
回答by tskuzzy
Looking at shape
看形状
Take a gander at the shape of the red portion of the can/bottle. Notice how the can tapers off slightly at the very top whereas the bottle label is straight. You can distinguish between these two by comparing the width of the red portion across the length of it.
看看罐子/瓶子的红色部分的形状。注意罐头是如何在最顶部略微变细的,而瓶子标签是直的。您可以通过比较整个红色部分的宽度来区分这两者。
Looking at highlights
看亮点
One way to distinguish between bottles and cans is the material. A bottle is made of plastic whereas a can is made of aluminum metal. In sufficiently well-lit situations, looking at the specularity would be one way of telling a bottle label from a can label.
区分瓶子和罐子的一种方法是材料。瓶子由塑料制成,而罐子由铝金属制成。在光线充足的情况下,查看镜面反射是区分瓶子标签和罐头标签的一种方式。
As far as I can tell, that is how a human would tell the difference between the two types of labels. If the lighting conditions are poor, there is bound to be some uncertainty in distinguishing the two anyways. In that case, you would have to be able to detect the presence of the transparent/translucent bottle itself.
据我所知,这就是人类如何区分两种标签之间的区别。如果光照条件差,无论如何在区分两者时肯定会有一些不确定性。在这种情况下,您必须能够检测到透明/半透明瓶子本身的存在。
回答by tskuzzy
Please take a look at Zdenek Kalal's Predator tracker. It requires some training, but it can actively learn how the tracked object looks at different orientations and scales and does it in realtime!
请查看 Zdenek Kalal 的Predator 追踪器。它需要一些训练,但它可以主动学习跟踪对象如何看待不同的方向和尺度,并实时进行!
The source code is available on his site. It's in MATLAB, but perhaps there is a Java implementation already done by a community member. I have succesfully re-implemented the tracker part of TLD in C#. If I remember correctly, TLD is using Ferns as the keypoint detector. I use either SURF or SIFT instead (already suggested by @stacker) to reacquire the object if it was lost by the tracker. The tracker's feedback makes it easy to build with time a dynamic list of sift/surf templates that with time enable reacquiring the object with very high precision.
源代码可以在他的网站上找到。它在MATLAB 中,但也许社区成员已经完成了一个 Java 实现。我已经成功地在 C# 中重新实现了 TLD 的跟踪器部分。如果我没记错的话,TLD 使用 Ferns 作为关键点检测器。如果跟踪器丢失了对象,我将使用 SURF 或 SIFT(已由 @stacker 建议)重新获取该对象。跟踪器的反馈可以很容易地随着时间的推移构建一个动态的筛选/冲浪模板列表,随着时间的推移能够以非常高的精度重新获取对象。
If you're interested in my C# implementation of the tracker, feel free to ask.
如果您对我的跟踪器的 C# 实现感兴趣,请随时提问。
回答by Fantastic Mr Fox
If you are not limited to just a camera which wasn't in one of your constraints perhaps you can move to using a range sensor like the Xbox Kinect. With this you can perform depth and colour based matched segmentation of the image. This allows for faster separation of objects in the image. You can then use ICP matching or similar techniques to even match the shape of the can rather then just its outline or colour and given that it is cylindrical this may be a valid option for any orientation if you have a previous 3D scan of the target. These techniques are often quite quick especially when used for such a specific purpose which should solve your speed problem.
如果您不仅限于不受约束的相机,也许您可以转向使用范围传感器,例如 Xbox Kinect。有了这个,您可以对图像执行基于深度和颜色的匹配分割。这允许更快地分离图像中的对象。然后,您可以使用 ICP 匹配或类似技术来匹配罐的形状,而不仅仅是其轮廓或颜色,并且鉴于它是圆柱形的,如果您之前对目标进行了 3D 扫描,这可能是任何方向的有效选项。这些技术通常非常快,尤其是当用于这种应该可以解决您的速度问题的特定目的时。
Also I could suggest, not necessarily for accuracy or speed but for fun you could use a trained neural network on your hue segmented image to identify the shape of the can. These are very fast and can often be up to 80/90% accurate. Training would be a little bit of a long process though as you would have to manually identify the can in each image.
我也可以建议,不一定是为了准确性或速度,但为了好玩,您可以在色调分割图像上使用训练有素的神经网络来识别罐子的形状。这些速度非常快,通常可以达到 80/90% 的准确率。尽管您必须手动识别每个图像中的罐头,但训练过程会有点长。
回答by Alex L
I would detect red rectangles: RGB -> HSV, filter red -> binary image, close(dilate then erode, known as imclose
in matlab)
我会检测红色矩形:RGB -> HSV,过滤红色 -> 二值图像,关闭(膨胀然后腐蚀,imclose
在 matlab 中称为)
Then look through rectangles from largest to smallest. Rectangles that have smaller rectangles in a known position/scale can both be removed (assuming bottle proportions are constant, the smaller rectangle would be a bottle cap).
然后从最大到最小查看矩形。在已知位置/比例中具有较小矩形的矩形都可以删除(假设瓶子比例不变,较小的矩形将是瓶盖)。
This would leave you with red rectangles, then you'll need to somehow detect the logos to tell if they're a red rectangle or a coke can. Like OCR, but with a known logo?
这会给您留下红色矩形,然后您需要以某种方式检测徽标以判断它们是红色矩形还是可乐罐。像 OCR,但有一个已知的标志?
回答by Sharad
This may be a very naive idea (or may not work at all), but the dimensions of all the coke cans are fixed. So may be if the same image contains both a can and a bottle then you can tell them apart by size considerations (bottles are going to be larger). Now because of missing depth (i.e. 3D mapping to 2D mapping) its possible that a bottle may appear shrunk and there isn't a size difference. You may recover some depth information using stereo-imagingand then recover the original size.
这可能是一个非常幼稚的想法(或者可能根本行不通),但所有可乐罐的尺寸都是固定的。因此,如果同一图像同时包含罐头和瓶子,那么您可以通过尺寸考虑将它们区分开来(瓶子会更大)。现在由于缺少深度(即 3D 映射到 2D 映射),瓶子可能会缩小并且没有大小差异。您可以使用立体成像恢复一些深度信息,然后恢复原始大小。