C++ 如何确定与视频中物体的距离?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/2135116/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How can I determine distance from an object in a video?
提问by Ryan R.
I have a video file recorded from the front of a moving vehicle. I am going to use OpenCV for object detection and recognition but I'm stuck on one aspect. How can I determine the distance from a recognized object.
我有一个从移动车辆前面录制的视频文件。我将使用 OpenCV 进行对象检测和识别,但我坚持一方面。如何确定与已识别物体的距离。
I can know my current speed and real-world GPS position but that is all. I can't make any assumptions about the object I'm tracking. I am planning to use this to track and follow objects without colliding with them. Ideally I would like to use this data to derive the object's real-world position, which I could do if I could determine the distance from the camera to the object.
我可以知道我当前的速度和真实世界的 GPS 位置,但仅此而已。我无法对我正在跟踪的对象做出任何假设。我计划使用它来跟踪和跟踪对象而不与它们发生碰撞。理想情况下,我想使用这些数据来推导出物体的真实世界位置,如果我可以确定从相机到物体的距离,我就可以做到这一点。
采纳答案by Robert Cartaino
When you have moving video, you can use temporal parallaxto determine the relative distance of objects. Parallax: (definition).
当您有移动视频时,您可以使用时间视差来确定对象的相对距离。视差:(定义)。
The effect would be the same we get with our eyes which which can gain depth perception by looking at the same object from slightly different angles. Since you are moving, you can use two successive video frames to get your slightly different angle.
效果与我们用眼睛得到的效果相同,眼睛可以通过从稍微不同的角度观察同一物体来获得深度感知。由于您正在移动,您可以使用两个连续的视频帧来获得稍微不同的角度。
Using parallax calculations, you can determine the relativesize and distance of objects (relative to one another). But, if you want the absolutesize and distance, you will need a known point of reference.
使用视差计算,您可以确定对象的相对大小和距离(相对于彼此)。但是,如果您想要绝对大小和距离,则需要一个已知的参考点。
You will also need to know the speed and direction being traveled (as well as the video frame rate) in order to do the calculations. You mightbe able to derive the speed of the vehicle using the visual data but that adds another dimension of complexity.
您还需要知道行进的速度和方向(以及视频帧速率)才能进行计算。您或许能够使用视觉数据推导出车辆的速度,但这又增加了另一个维度的复杂性。
The technology already exists. Satellites determine topographic prominence(height) by comparing multiple images taken over a short period of time. We use parallax to determine the distance of stars by taking photos of night sky at different points in earth's orbit around the sun. I was able to create 3-D images out of an airplane window by taking two photographs within short succession.
该技术已经存在。卫星通过比较短时间内拍摄的多幅图像来确定地形突出度(高度)。我们通过在地球围绕太阳运行的轨道上的不同点拍摄夜空照片,使用视差来确定星星的距离。通过在短时间内连续拍摄两张照片,我能够在飞机窗口外创建 3D 图像。
The exact technology and calculations (even if I knew them off the top of my head) are wayoutside the scope of discussing here. If I can find a decent reference, I will post it here.
确切的技术和计算(即使我知道他们把我的头顶部)的方式在这里讨论的范围之内。如果我能找到合适的参考资料,我会在这里发布。
回答by Jacob
Your problem's quite standard in the field.
您的问题在该领域非常标准。
Firstly,
首先,
you need to calibrate your camera. This can be done offline(makes life muchsimpler) or onlinethrough self-calibration.
你需要校准你的相机。这可以离线完成(让生活更简单)或通过自校准在线完成。
Calibrate it offline- please.
离线校准- 请。
Secondly,
其次,
Once you have the calibration matrix of the camera K, determine the projection matrix of the camera in a successive scene (you need to use parallax as mentioned by others). This is described well in this OpenCV tutorial.
获得相机K的校准矩阵后,确定相机在连续场景中的投影矩阵(您需要使用其他人提到的视差)。这在OpenCV 教程中有很好的描述。
You'll have to use the GPS information to find the relative orientation between the cameras in the successive scenes (that might be problematic due to noise inherent in most GPS units), i.e. the Rand tmentioned in the tutorial or the rotation and translation between the two cameras.
您必须使用 GPS 信息来找到连续场景中相机之间的相对方向(由于大多数 GPS 单元固有的噪声,这可能会出现问题),即教程中提到的R和t或旋转和平移两个摄像头之间。
Once you've resolved all that, you'll have two projection matrices --- representations of the cameras at those successive scenes. Using one of these so-called camera matrices, you can "project" a 3D point Mon the scene to the 2D image of the camera on to pixel coordinate m(as in the tutorial).
一旦您解决了所有这些问题,您将拥有两个投影矩阵——这些连续场景中的摄像机表示。使用这些所谓的相机矩阵之一,您可以将场景上的3D 点M“投影”到相机的 2D 图像上的像素坐标m(如教程中所示)。
We will use this to triangulate the real 3D point from 2D points found in your video.
我们将使用它从视频中找到的 2D 点对真实的 3D 点进行三角测量。
Thirdly,
第三,
use an interest point detector to track the same point in your video which lies on the object of interest. There are several detectors available, I recommend SURFsince you have OpenCV which also has several other detectors like Shi-Tomasi corners, Harris, etc.
使用兴趣点检测器跟踪视频中位于感兴趣对象上的同一点。有几个探测器可用,我建议SURF,因为你有OpenCV的其中也有其他几个探测器像世托马西角落,哈里斯,等等。
Fourthly,
第四,
Once you've tracked points of your object across the sequence and obtained the corresponding 2D pixel coordinates you must triangulatefor the best fitting 3D point given your projection matrix and 2D points.
在整个序列中跟踪对象的点并获得相应的 2D 像素坐标后,您必须根据投影矩阵和 2D 点对最适合的 3D 点进行三角测量。
The above image nicely captures the uncertainty and how a best fitting 3D point is computed. Of course in your case, the cameras are probably in front of each other!
上图很好地捕捉了不确定性以及如何计算最佳拟合 3D 点。当然,在您的情况下,相机可能在彼此的前面!
Finally,
最后,
Once you've obtained the 3D points on the object, you can easily compute the Euclidean distance between the camera center (which is the origin in most cases) and the point.
获得对象上的 3D 点后,您可以轻松计算相机中心(大多数情况下为原点)与该点之间的欧几里得距离。
Note
笔记
This is obviously not easy stuff but it's not that hard either. I recommend Hartley and Zisserman's excellent book Multiple View Geometrywhich has described everything above in explicit detail with MATLAB code to boot.
这显然不是一件容易的事情,但也不是那么难。我推荐 Hartley 和 Zisserman 的优秀书籍多视图几何,它用 MATLAB 代码详细描述了上述所有内容。
Have fun and keep asking questions!
玩得开心,继续提问!
回答by ravenspoint
You need to identify the same points in the same object on two different frames taken a known distance apart. Since you know the location of the camera in each frame, you have a baseline ( the vector between the two camera positions. Construct a triangle from the known baseline and the angles to the identified points. Trigonometry gives you the length of the unknown sides of the traingles for the known length of the baseline and the known angles between the baseline and the unknown sides.
您需要在相距已知距离的两个不同帧上识别同一对象中的相同点。由于您知道相机在每一帧中的位置,因此您有一个基线(两个相机位置之间的向量。从已知基线和到已识别点的角度构造一个三角形。三角学为您提供未知边的长度已知基线长度和基线与未知边之间已知角度的三角形。
You can use two cameras, or one camera taking successive shots. So, if your vehicle is moving a 1 m/s and you take fames every second, then successibe frames will gibe you a 1m baseline which should be good to measure the distance of objects up to, say, 5m away. If you need to range objects further away than the frames used need to be further apart - however more distant objects will in view for longer.
您可以使用两台相机,或一台相机连续拍摄。因此,如果您的车辆以 1 m/s 的速度移动,并且您每秒记录一次,那么连续帧将为您提供 1m 的基线,这应该可以很好地测量物体的距离,例如 5m。如果您需要比使用的框架更远的对象距离需要更远的距离 - 但是更远的对象将在视野中更长的时间。
Observer at F1 sees target at T with angle a1 to velocity vector. Observer moves distance b to F2. Sees target at T with angle a2.
F1 处的观察者看到 T 处的目标,与速度矢量的夹角为 a1。观察者将距离 b 移动到 F2。在 T 处看到目标,角度为 a2。
Required to find r1, range from target at F1
需要查找 r1,范围从 F1 处的目标
The trigonometric identity for cosine gives
余弦的三角恒等式给出
Cos( 90 – a1 ) = x / r1 = c1
Cos( 90 – a1 ) = x / r1 = c1
Cos( 90 - a2 ) = x / r2 = c2
Cos( 90 - a2 ) = x / r2 = c2
Cos( a1 ) = (b + z) / r1 = c3
Cos( a1 ) = (b + z) / r1 = c3
Cos( a2 ) = z / r2 = c4
Cos( a2 ) = z / r2 = c4
x is distance to target orthogonal to observer's velocity vector
x 是与观察者的速度向量正交的到目标的距离
z is distance from F2 to intersection with x
z 是从 F2 到与 x 的交点的距离
Solving for r1
求解 r1
r1 = b / ( c3 – c1 . c4 / c2 )
r1 = b / ( c3 – c1 . c4 / c2 )
回答by Steven Sudit
Two cameras so you can detect parallax. It's what humans do.
两个摄像头,因此您可以检测视差。这就是人类所做的。
edit
编辑
Please see ravenspoint's answer for more detail. Also, keep in mind that a single camera with a splitter would probably suffice.
有关更多详细信息,请参阅 ravenspoint 的回答。另外,请记住,带有分离器的单个相机可能就足够了。
回答by Egon
use stereo disparity maps. lots of implementations are afloat, here are some links: http://homepages.inf.ed.ac.uk/rbf/CVonline/LOCAL_COPIES/OWENS/LECT11/node4.html
使用立体视差图。许多实现是浮动的,这里有一些链接:http: //homepages.inf.ed.ac.uk/rbf/CVonline/LOCAL_COPIES/OWENS/LECT11/node4.html
http://www.ece.ucsb.edu/~manj/ece181bS04/L14(morestereo).pdf
http://www.ece.ucsb.edu/~manj/ece181bS04/L14(morestereo).pdf
In you case you don't have stereo camera, but depth can be evaluated using video http://www.springerlink.com/content/g0n11713444148l2/
如果您没有立体相机,但可以使用视频http://www.springerlink.com/content/g0n11713444148l2/评估深度
I think the above will be what might help you the most.
我认为以上将是最能帮助你的。
research has progressed so far that depth can be evaluated ( though not to a satisfactory extend) from a single monocular image http://www.cs.cornell.edu/~asaxena/learningdepth/
到目前为止,研究已经取得进展,可以从单个单目图像评估深度(尽管不能令人满意) http://www.cs.cornell.edu/~asaxena/learningdepth/
回答by Pontiac6000fan
Someone please correct me if I'm wrong, but it seems to me that if you're going to simply use a single camera and simply relying on a software solution, any processing you might do would be prone to false positives. I highly doubt that there is any processing that could tell the difference between objects that really are at the perceived distance and those which only appear to be at that distance (like the "forced perspective") in movies.
如果我错了,请有人纠正我,但在我看来,如果您只想使用单个相机并仅依靠软件解决方案,那么您可能进行的任何处理都容易出现误报。我非常怀疑是否有任何处理可以分辨出真正处于感知距离的物体与电影中仅出现在该距离(如“强制视角”)的物体之间的区别。
Any chance you could add an ultrasonic sensor?
你有机会添加一个超声波传感器吗?
回答by harounbest
first you should calibrate your camera so you can get the relation between the objects positions in the camera plan and their positions in teh real world plan, if you are using one camera so may you will use the "optical flow technique" if you are using two cameras you just use a simple triangulatio to find the real position (it will be easy to find the distance of the objects) but the probem with this second methose is the matching which means how can you find the position of an object 'x' in camera2 if you already knoz it`s position in camera1 and here you can use the 'SIFT' algorithme. i just gave you some keywords wish it could help you.
首先你应该校准你的相机,这样你就可以得到相机平面中的物体位置和它们在现实世界平面中的位置之间的关系,如果你使用的是一台相机,那么你可以使用“光流技术”,如果你正在使用两台相机您只需使用简单的三角测量即可找到真实位置(很容易找到物体的距离),但是使用第二种方法的探针是匹配的,这意味着您如何找到物体“x”的位置在camera2中,如果你已经在camera1中找到它的位置,在这里你可以使用“SIFT”算法。我只是给了你一些关键词希望它可以帮助你。
回答by Kelly S. French
Put and object of known size in the cameras field of view. That way you can have a more objective metric to measure angular distances. Without a second viewpoint/camera you'll be limited to estimating size/distance but at least it won't be a complete guess.
将已知大小的物体放入相机视野中。这样你就可以有一个更客观的度量来测量角距离。如果没有第二个视点/相机,您将仅限于估计尺寸/距离,但至少它不会是一个完整的猜测。