C++ OpenCV：跟踪街上移动的人

Question

提问by ákos Maróy

I'm trying to make moving people tracking work with OpenCV in C++, with a camera looking at a street and people moving about it. for a sample video I shot and I'm using, see here: http://akos.maroy.hu/~akos/eszesp/MVI_0778.MOV

我正在尝试使用 C++ 中的 OpenCV 使移动人员跟踪工作，相机看着街道，人们在街道上移动。有关我拍摄并正在使用的示例视频，请参见此处：http: //akos.maroy.hu/~akos/eszesp/MVI_0778.MOV

I read up on this topic, and I tried a number of things, including:

我阅读了这个主题，并尝试了很多方法，包括：

background detection and creating contours
try to detect blobs (keypoints for blobs)
using a people detector for each frame with a HOGDescriptor

背景检测和创建轮廓
尝试检测斑点（斑点的关键点）
使用 HOGDescriptor 为每一帧使用人员检测器

but none of these provide a good result. for my sample code, see below. for the output of the code based on the above video, see: http://akos.maroy.hu/~akos/eszesp/ize.avi. the contours detected against the background are in red, the bounding rectangles of the contours are in green, and the HOG people detector results are in blue.

但这些都没有提供好的结果。对于我的示例代码，请参见下文。基于上述视频的代码输出见：http: //akos.maroy.hu/~akos/eszesp/ize.avi。针对背景检测到的轮廓为红色，轮廓的边界矩形为绿色，HOG 人物检测结果为蓝色。

the specific issues I have are:

我遇到的具体问题是：

background detection and then finding contours seems to work fine, although there are some false positives. but the main drawback is that a lot of times a single person is 'cut up' into multiple contours. is there a simple way to 'join' these together, maybe by an assumed 'ideal' person size, or some other means?

背景检测然后找到轮廓似乎工作正常，尽管有一些误报。但主要的缺点是很多时候一个人会被“切割”成多个轮廓。是否有一种简单的方法可以将这些“连接”在一起，也许是通过假设的“理想”人的大小，或其他方式？

as for the HOG people detector, in my case it very seldomly identifies the real people on the image. what could I be doing wrong there?

至于 HOG 人物检测器，就我而言，它很少能识别出图像上的真实人物。我在那里做错了什么？

all pointers, ideas welcome!

欢迎所有指点，想法！

and thus, the code I'm using so far, which is a cust-and-paste glory of various samples I found here and there:

因此，到目前为止我使用的代码是我在这里和那里找到的各种样本的复制粘贴荣耀：

#include<opencv2/opencv.hpp>
#include<iostream>
#include<vector>

int main(int argc, char *argv[])
{
    if (argc < 3) {
        std::cerr << "Usage: " << argv[0] << " in.file out.file" << std::endl;
        return -1;
    }

    cv::Mat frame;
    cv::Mat back;
    cv::Mat fore;
    std::cerr << "opening " << argv[1] << std::endl;
    cv::VideoCapture cap(argv[1]);
    cv::BackgroundSubtractorMOG2 bg;
    //bg.nmixtures = 3;
    //bg.bShadowDetection = false;

    cv::VideoWriter output;
    //int ex = static_cast<int>(cap.get(CV_CAP_PROP_FOURCC));
    int ex = CV_FOURCC('P','I','M','1');
    cv::Size size = cv::Size((int) cap.get(CV_CAP_PROP_FRAME_WIDTH),
                             (int) cap.get(CV_CAP_PROP_FRAME_HEIGHT));
    std::cerr << "saving to " << argv[2] << std::endl;
    output.open(argv[2], ex, cap.get(CV_CAP_PROP_FPS), size, true);

    std::vector<std::vector<cv::Point> > contours;

    cv::namedWindow("Frame");
    cv::namedWindow("Fore");
    cv::namedWindow("Background");


    cv::SimpleBlobDetector::Params params;
    params.minThreshold = 40;
    params.maxThreshold = 60;
    params.thresholdStep = 5;
    params.minArea = 100; 
    params.minConvexity = 0.3;
    params.minInertiaRatio = 0.01;
    params.maxArea = 8000;
    params.maxConvexity = 10;
    params.filterByColor = false;
    params.filterByCircularity = false;


    cv::SimpleBlobDetector blobDtor(params);
    blobDtor.create("SimpleBlob");

    std::vector<std::vector<cv::Point> >    blobContours;
    std::vector<cv::KeyPoint>               keyPoints;
    cv::Mat                                 out;

    cv::HOGDescriptor hog;
    hog.setSVMDetector(cv::HOGDescriptor::getDefaultPeopleDetector());


    for(;;)
    {
        cap >> frame;

        bg.operator ()(frame, fore);

        bg.getBackgroundImage(back);
        cv::erode(fore, fore, cv::Mat());
        cv::dilate(fore, fore, cv::Mat());

        blobDtor.detect(fore, keyPoints, cv::Mat());

        //cv::imshow("Fore", fore);

        cv::findContours(fore, contours, CV_RETR_EXTERNAL, CV_CHAIN_APPROX_NONE);
        cv::drawContours(frame, contours, -1, cv::Scalar(0,0,255), 2);

        std::vector<std::vector<cv::Point> >::const_iterator it = contours.begin();
        std::vector<std::vector<cv::Point> >::const_iterator end = contours.end();
        while (it != end) {
            cv::Rect bounds = cv::boundingRect(*it);
            cv::rectangle(frame, bounds, cv::Scalar(0,255,0), 2);

            ++it;
        }

        cv::drawKeypoints(fore, keyPoints, out, CV_RGB(0,255,0), cv::DrawMatchesFlags::DEFAULT);
        cv::imshow("Fore", out);


        std::vector<cv::Rect> found, found_filtered;
        hog.detectMultiScale(frame, found, 0, cv::Size(8,8), cv::Size(32,32), 1.05, 2);
        for (int i = 0; i < found.size(); ++i) {
            cv::Rect r = found[i];
            int j = 0;
            for (; j < found.size(); ++j) {
                if (j != i && (r & found[j]) == r) {
                    break;
                }
            }
            if (j == found.size()) {
                found_filtered.push_back(r);
            }
        }

        for (int i = 0; i < found_filtered.size(); ++i) {
            cv::Rect r = found_filtered[i];
            cv::rectangle(frame, r.tl(), r.br(), cv::Scalar(255,0,0), 3);
        }


        output << frame;

        cv::resize(frame, frame, cv::Size(1280, 720));
        cv::imshow("Frame", frame);

        cv::resize(back, back, cv::Size(1280, 720));
        cv::imshow("Background", back);



        if(cv::waitKey(30) >= 0) break;
    }
    return 0;
}

Answer 1

回答by nkint

Actually, it is very wide topic. There are plenty of scientific papers that tries to attack this problem. You should read something before.

实际上，这是一个非常广泛的话题。有很多科学论文试图解决这个问题。你应该先读点东西。

Briefly: Background detection and contours is the easiest technique. OpenCV has very nice implementations, also optimized for the gpu. For refine the foreground/background blobs you can use some morphological operation,try to close holes in the blobs and get better results. But do not expect perfect results. Background subtraction is a difficult operation, you can spend hours in fine tune parameters for a given dataset, then try your code in the real world and.. nothing works. Lights, shadows, background changes with non-interested objects.. just for mention some problems.

简而言之：背景检测和轮廓是最简单的技术。OpenCV 有非常好的实现，也针对 GPU 进行了优化。为了细化前景/背景斑点，您可以使用一些形态学操作，尝试关闭斑点中的孔并获得更好的结果。但不要期待完美的结果。背景减法是一项困难的操作，您可以花数小时为给定的数据集微调参数，然后在现实世界中尝试您的代码，但没有任何效果。灯光、阴影、背景与不感兴趣的物体发生变化……只是提到了一些问题。

So.. no, there is no a simple and standard technique for handling the so called "blob fragmentation" or "split-merge" problem (sometime one person is split in more blobs, sometime more people are merged in one single blob). Again, it's full of scientific papers on this argument. But there are techniques for handling the tracking of incomplete or clutter observation. One of the easiest is to try to infer the real state of the system given some incomplete observation with Kalman filter. Opencv has a nice implementation on that. Again, if you do some search on "Kalman filter tracking" or "GNN data association" you'll find a lot.

所以.. 不，没有一种简单和标准的技术来处理所谓的“blob 碎片”或“拆分合并”问题（有时一个人被分成更多的 blob，有时更多的人被合并在一个 blob 中）。同样，它充满了关于这个论点的科学论文。但是有一些技术可以处理不完整或杂乱观察的跟踪。最简单的方法之一是在使用卡尔曼滤波器进行一些不完整观察的情况下尝试推断系统的真实状态。Opencv 在这方面有一个很好的实现。同样，如果您对“卡尔曼滤波器跟踪”或“GNN 数据关联”进行一些搜索，您会发现很多。

If you want to use some geometrical information like estimating the height of a person etc, you can do it but you need the calibration parameters of the camera. That implies have them available (microsoft kinect of standard iphone camera has their parameters available) or calculating them though a camera calibration process. This means to download a chessboard image, print it on a paper, and take some pictures of it. Then, OpenCV has all methods for doing the calibration. After that, you need to estimate the ground plane, and then use some simple render project/unproject methods for going from 2d to 3d coordinates forth and back, and estimate the 2d bounding box of a 3d standard person.

如果您想使用一些几何信息，例如估计人的高度等，您可以这样做，但您需要相机的校准参数。这意味着它们可用（标准 iphone 相机的 microsoft kinect 具有可用的参数）或通过相机校准过程计算它们。这意味着下载棋盘图像，将其打印在纸上，然后拍摄一些照片。然后，OpenCV 具有进行校准的所有方法。之后，您需要估计地平面，然后使用一些简单的渲染投影/取消投影方法从2d坐标到3d坐标来回移动，并估计3d标准人的2d边界框。

Modern approaches on "pedestrian tracking" extract observation with some detector. Background subtraction can give a map where to try to detect to not search on the hole image, but blob detection is useless in this case. In OpenCV the more used implementations in this case are Haar Adaboost detector and HOG detector.HOG detector seems to give better results in some cases. Already implemented classifier in OpenCV includes face detector for Haar and people detect for HOG. You'll find examples in both cpp and python samples in the OpenCV repository.

“行人跟踪”的现代方法使用一些检测器提取观察结果。背景减法可以给出尝试检测的地图，而不是在孔图像上进行搜索，但在这种情况下，斑点检测是无用的。在 OpenCV 中，这种情况下使用较多的实现是 Haar Adaboost 检测器和 HOG 检测器。HOG 检测器在某些情况下似乎能提供更好的结果。已经在 OpenCV 中实现的分类器包括用于 Haar 的人脸检测器和用于 HOG 的人物检测器。您可以在 OpenCV 存储库的 cpp 和 python 示例中找到示例。

If the standard detections fail (your video are with different size or you have to detect other object than pedestrians).. you have to train your own detector. That means collect some images of object you want to detect (positive samples), and some images with something else (negative samples) and train your own classifiers with machine learning techniques like SVN. again, google is your friend : )

如果标准检测失败（您的视频大小不同，或者您必须检测行人以外的其他物体）..您必须训练自己的检测器。这意味着收集一些您想要检测的对象的图像（正样本），以及一些带有其他内容的图像（负样本），并使用 SVN 等机器学习技术训练您自己的分类器。再次，谷歌是你的朋友:)

Good luck!

祝你好运！

Answer 2

回答by David Elliman

Have you seen the Reading People Tracker. This was a research project but is open source and quite effective. See here

你看过阅读人物追踪器吗？这是一个研究项目，但它是开源的并且非常有效。看这里

It is probably not quite state-of-the-art now, but the source is available and it is quite well structured.

它现在可能不是最先进的，但源可用并且结构很好。

Answer 3

回答by LovaBill

I would create a human tracker like this:

我会创建一个这样的人类追踪器：

First, we must initialize the objects. How? Object detection. Use HOG or the cascade classifier with the proper model (i.e. haarcascade_fullbody.xml) (or use them all together).
Then, we must TRACK those pixels found inside the bounding boxes. How? Match past templates! Idea: accumulate more than one into a vector<cv::Mat>and use the mean templatefor correlation.

首先，我们必须初始化对象。如何？对象检测。使用具有适当模型的 HOG 或级联分类器（即 haarcascade_fullbody.xml）（或将它们一起使用）。
然后，我们必须跟踪在边界框内找到的那些像素。如何？匹配过去的模板！想法：将多于一个累积成一个vector<cv::Mat>并使用mean template进行相关性。

More ideas:

更多想法：

Combine the results: Use the detector as the most reliable observation model and if it fails switch to template matching.
Use background modeling to filer False Positives (FPs correlate excellent with the background).

组合结果：使用检测器作为最可靠的观察模型，如果失败则切换到模板匹配。
使用背景建模来过滤误报（FP 与背景相关性极好）。

Also, try blobtrack_sample.cpp found in opencv sample folder if you want contours based tracking.

另外，如果您想要基于轮廓的跟踪，请尝试在 opencv 示例文件夹中找到的 blobtrack_sample.cpp。

Answer 4

回答by Bhanu Challa

You are missing the "motion model" component of tracking. Kalman/Particle filter should help. I prefer Kalman.

您缺少跟踪的“运动模型”组件。卡尔曼/粒子滤波器应该会有所帮助。我更喜欢卡尔曼。

C++ OpenCV：跟踪街上移动的人

提问by ákos Maróy

回答by nkint

回答by David Elliman

回答by LovaBill

回答by Bhanu Challa

相关推荐

最近更新

标签

C++ OpenCV：跟踪街上移动的人

提问by ákos Maróy

回答by nkint

回答by David Elliman

回答by LovaBill

回答by Bhanu Challa

相关推荐

C++ 复数，正确的格式是什么？

C++ .h 文件中应该包含哪些内容？

C++ 将字符串转换为无符号整数会返回错误的结果

使用 C 字符串：“与返回的局部变量关联的堆栈内存地址”

相关推荐

最近更新

标签