C++ 使用 OpenCV 进行视频稳定
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/3431434/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Video Stabilization with OpenCV
提问by Eldila
I have a video feed which is taken with a moving camera and contains moving objects. I would like to stabilize the video, so that all stationary objects will remain stationary in the video feed. How can I do this with OpenCV?
我有一个视频源,它是用移动的相机拍摄的,包含移动的物体。我想稳定视频,以便所有静止物体在视频源中保持静止。我怎样才能用OpenCV做到这一点?
i.e. For example, if I have two images prev_frame and next_frame, how do I transform next_frameso the video camera appears stationary?
即例如,如果我有两个图像 prev_frame 和 next_frame,我如何转换next_frame使摄像机看起来静止?
回答by zerm
I can suggest one of the following solutions:
我可以建议以下解决方案之一:
- Using local high level features: OpenCV includes SURF, so: for each frame, extract SURF features. Then build feature Kd-Tree (also in OpenCV), then match each two consecutive frames to find pairs of corresponding features. Feed those pairs into cvFindHomography to compute the homography between those frames. Warp frames according to (combined..) homographies to stabilize. This is, to my knowledge, a very robust and sophisticated approach, however SURF extraction and matching can be quite slow
- You can try to do the above with "less robust" features, if you expect only minor movement between two frames, e.g. use Harris corner detection and build pairs of corners closest to each other in both frames, feed to cvFindHomography then as above. Probably faster but less robust.
- If you restrict movement to translation, you might be able to replace cvFindHomography with something more...simple, to just get the translation between feature-pairs (e.g. average)
- Use phase-correlation (ref. http://en.wikipedia.org/wiki/Phase_correlation), if you expect only translation between two frames. OpenCV includes DFT/FFT and IFFT, see the linked wikipedia article on formulas and explanation.
- 使用局部高级特征:OpenCV 包含 SURF,因此:对于每一帧,提取 SURF 特征。然后构建特征 Kd-Tree(也在 OpenCV 中),然后匹配每两个连续帧以找到对应的特征对。将这些对输入 cvFindHomography 以计算这些帧之间的单应性。根据(组合..)单应性扭曲帧以稳定。据我所知,这是一种非常强大和复杂的方法,但是 SURF 提取和匹配可能会很慢
- 您可以尝试使用“不太稳健”的功能来执行上述操作,如果您期望两帧之间只有微小的移动,例如使用哈里斯角检测并在两帧中构建彼此最接近的角对,然后如上所述提供给 cvFindHomography。可能更快但不那么健壮。
- 如果您将移动限制为翻译,您可能可以用更简单的东西替换 cvFindHomography,以获取特征对之间的翻译(例如平均值)
- 如果您只希望两帧之间的转换,请使用相位相关(参考http://en.wikipedia.org/wiki/Phase_correlation)。OpenCV 包括 DFT/FFT 和 IFFT,请参阅有关公式和解释的链接维基百科文章。
EDITThree remarks I should better mention explicitly, just in case:
编辑我最好明确提及的三个评论,以防万一:
- The homography based approach is likely very exact, so stationary object will remain stationary. However, homographies include perspective distortion and zoom as well so the result might look a bit..uncommon (or even distorted for some fast movements). Although exact, this might be less visually pleasing; so use this rather for further processing or, like, forensics. But you should try it out, could be super-pleasing for some scenes/movements as well.
- To my knowledge, at least several free video-stabilization tools use the phase-correlation. If you just want to "un-shake" the camera, this might be preferable.
- There is quite some research going on in this field. You'll find some a lot more sophisticated approaches in some papers (although they likely require more than just OpenCV).
- 基于单应性的方法可能非常精确,因此静止物体将保持静止。但是,单应性还包括透视失真和缩放,因此结果可能看起来有点……不常见(甚至某些快速移动会失真)。虽然准确,但这可能不太美观;因此,请将此用于进一步处理或取证等。但是您应该尝试一下,对于某些场景/动作也可能非常令人愉悦。
- 据我所知,至少有几个免费的视频稳定工具使用相位相关。如果您只想“不摇晃”相机,这可能更可取。
- 在这个领域有相当多的研究正在进行。您会在一些论文中发现一些更复杂的方法(尽管它们可能需要的不仅仅是 OpenCV)。
回答by Octopus
OpenCV has the functions estimateRigidTransform() and warpAffine() which handle this sort of problem really well.
OpenCV 有函数estimateRigidTransform() 和warpAffine() 可以很好地处理这类问题。
Its pretty much as simple as this:
它非常简单:
Mat M = estimateRigidTransform(frame1,frame2,0)
warpAffine(frame2,output,M,Size(640,480),INTER_NEAREST|WARP_INVERSE_MAP)
Now output
contains the contents of frame2
that is best aligned to fit to frame1
.
For large shifts, M will be a zero Matrix or it might not be a Matrix at all, depending on the version of OpenCV, so you'd have to filter those and not apply them. I'm not sure how large that is; maybe half the frame width, maybe more.
现在output
包含frame2
最适合对齐的内容frame1
。对于大的变化,M 将是一个零矩阵,或者它可能根本不是一个矩阵,这取决于 OpenCV 的版本,因此您必须过滤它们而不应用它们。我不确定那有多大;也许是框架宽度的一半,也许更多。
The third parameter to estimateRigidTransform is a boolean that tells it whether to also apply an arbitrary affine matrix or restrict it to translation/rotation/scaling. For the purposes of stabilizing an image from a camera you probably just want the latter. In fact, for camera image stabilization you might also want to remove any scaling from the returned matrix by normalizing it for only rotation and translation.
estimateRigidTransform 的第三个参数是一个布尔值,它告诉它是否也应用任意仿射矩阵或将其限制为平移/旋转/缩放。为了稳定来自相机的图像,您可能只想要后者。事实上,为了相机图像稳定,您可能还希望通过仅针对旋转和平移对其进行归一化来从返回的矩阵中删除任何缩放。
Also, for a moving camera, you'd probably want to sample M through time and calculate a mean.
此外,对于移动的相机,您可能希望通过时间对 M 进行采样并计算平均值。
Here are links to more info on estimateRigidTransform(), and warpAffine()
以下是有关estimateRigidTransform()和warpAffine() 的更多信息的链接
回答by RawMean
openCV now has a video stabilization class: http://docs.opencv.org/trunk/d5/d50/group__videostab.html
openCV 现在有一个视频稳定类:http: //docs.opencv.org/trunk/d5/d50/group__videostab.html
回答by u3547485
I past my answer from this one. How to stabilize Webcam video?
我从这个答案中过去了。如何稳定网络摄像头视频?
Yesterday I just did some works (in Python
) on this subject, main steps are:
昨天我刚刚Python
在这个主题上做了一些工作(在),主要步骤是:
- use
cv2.goodFeaturesToTrack
to find good corners. - use
cv2.calcOpticalFlowPyrLK
to track the corners. - use
cv2.findHomography
to compute the homography matrix. - use
cv2.warpPerspective
to transform video frame.
- 使用
cv2.goodFeaturesToTrack
找到好的角落。 - 用于
cv2.calcOpticalFlowPyrLK
跟踪拐角。 - 用于
cv2.findHomography
计算单应矩阵。 - 用于
cv2.warpPerspective
转换视频帧。
But the result is not that ideal now, may be I should choose SIFT keypoints
other than goodFeatures
.
但结果是不是理想的现在,可能是我应该选择SIFT keypoints
比其他goodFeatures
。
Source:
来源:
Stabilize the car:
稳定汽车:
回答by Eunchul Jeon
Here is already good answer, but it use a little bit old algorithm and I developed the program to solve the similar problem so i add additional answer.
这里已经是很好的答案,但它使用了一点旧算法,我开发了程序来解决类似的问题,所以我添加了额外的答案。
- At first, you should extract feature from image using feature extractor like SIFT, SURF algorithm. In my case, FAST+ORB algorithm is best. If you want more information, See this paper
- After you get the features in images, you should find matching features with images.there are several matcher but Bruteforce matcher is not bad. If Bruteforce is slow in your system, you should use a algorithm like KD-Tree.
- Last, you should get geometric transformation matrix which is minimize error of transformed points. You can use RANSAC algorithm in this process. You can develop all this process using OpenCV and I already developed it in mobile devices. See this repository
回答by Ian Wetherbee
This is a tricky problem, but I can suggest a somewhat simple situation off the top of my head.
这是一个棘手的问题,但我可以提出一个有点简单的情况。
- Shift/rotate
next_frame
by an arbitrary amount - Use background subtraction
threshold(abs(prev_frame-next_frame_rotated))
to find the static elements. You'll have to play around with what threshold value to use. - Find
min(template_match(prev_frame_background, next_frame_rotated_background))
- Record the shift/rotation of the closest match and apply it to
next_frame
- 移动/旋转
next_frame
任意量 - 使用背景减法
threshold(abs(prev_frame-next_frame_rotated))
来查找静态元素。您必须尝试使用要使用的阈值。 - 找
min(template_match(prev_frame_background, next_frame_rotated_background))
- 记录最接近匹配项的移位/旋转并将其应用于
next_frame
This won't work well for multiple frames over time, so you'll want to look into using a background accumulatorso the background the algorithm looks for is similar over time.
随着时间的推移,这不适用于多帧,因此您需要考虑使用背景累加器,以便算法寻找的背景随着时间的推移而相似。
回答by Rui Marques
I should add the following remarks to complete zerm's answer. It will simplify your problem if one stationary object is chosen and then work with zerm's approach (1) with that single object. If you find a stationary object and apply the correction to it, I think it is safe to assume the other stationary objects will also look stable.
我应该添加以下评论来完成zerm 的回答。如果选择一个静止物体,然后使用 zerm 的方法 (1) 处理该单个物体,它将简化您的问题。如果您找到一个静止物体并对其进行校正,我认为可以安全地假设其他静止物体看起来也很稳定。
Although it is certainly valid for your tough problem, you will have the following problems with this approach:
虽然它肯定对你的棘手问题有效,但使用这种方法你会遇到以下问题:
Detection and homography estimation will sometimes fail for various reasons: occlusions, sudden moves, motion blur, severe lighting differences. You will have to search ways to handle it.
Your target object(s) might have occlusions, meaning its detection will fail on that frame and you will have to handle occlusions which is itself a whole research topic.
Depending on your hardware and the complexity of your solution, you might have some troubles achieving real-time results using SURF. You might try opencv's gpu implementation or other faster feature detectors like ORB, BRIEF or FREAK.
检测和单应性估计有时会因各种原因而失败:遮挡、突然移动、运动模糊、严重的光照差异。您将不得不寻找处理它的方法。
您的目标对象可能有遮挡,这意味着它在该帧上的检测将失败,您将不得不处理遮挡,这本身就是一个完整的研究课题。
根据您的硬件和解决方案的复杂性,使用 SURF 实现实时结果可能会遇到一些麻烦。您可以尝试 opencv 的 gpu 实现或其他更快的特征检测器,如 ORB、BRIEF 或 FREAK。
回答by Yash_6795
Background:I was working on this research project where I was trying to calculate that how long would it take for person standing in the queue to reach to the counter. First thing I required was FOOTAGE so I went to the campus and recorded some tourist moving in the queue to get tickets. Till this point I had not idea how would I gonna calculate queuing time and what pre-caution should I take while recording the footage. At the end of the day I found that all the footage I recorded was recorded with shaky camera. So at this point I was first required to stabilize the video and then only develop other solution to calculate queuing time.
背景:我在做这个研究项目时,我试图计算站在队列中的人到达柜台需要多长时间。我需要的第一件事是 FOOTAGE,所以我去了校园并记录了一些游客在排队取票的情况。到目前为止,我不知道我将如何计算排队时间以及在录制镜头时应该采取哪些预防措施。一天结束时,我发现我录制的所有镜头都是用抖动的相机录制的。所以此时我首先被要求稳定视频,然后才开发其他解决方案来计算排队时间。
Video Stabilization using Template Matching
使用模板匹配的视频稳定
- Find static objects such as poll, door or something that you know are not supposed to move
- Use template matching to calculate offset of change of location of static object(relative to the frame boundaries) in each consecutive frame.
- Translate the each frame with the offset value i.e. let's say tx and ty.
- 查找静态物体,例如投票、门或您知道不应移动的物体
- 使用模板匹配计算每个连续帧中静态对象(相对于帧边界)位置变化的偏移量。
- 用偏移值翻译每一帧,比如 tx 和 ty。
Result Footage:
结果画面:
Gif to show the result of this technique
As you can see in the gif that selected static object remain static w.r.t the frame boundaries while motion can be see by black filling from the edges of the frame.
正如您在 gif 中所看到的,选定的静态对象在帧边界上保持静态,而通过帧边缘的黑色填充可以看到运动。