C++ 如何在 ffmpeg 中使用硬件加速

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/23289157/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-28 00:24:04  来源:igfitidea点击:

How to use hardware acceleration with ffmpeg

c++cffmpeghardware-acceleration

提问by ixSci

I need to have ffmpeg decode my video(e.g. h264) using hardware acceleration. I'm using the usual way of decoding frames: read packet -> decode frame. And I'd like to have ffmpeg speed up decoding. So I've built it with --enable-vaapiand --enable-hwaccel=h264. But I don't really know what should I do next. I've tried to use avcodec_find_decoder_by_name("h264_vaapi")but it returns nullptr. Anyway, I might want to use others API and not just VA API. How one is supposed to speed up ffmpeg decoding?

我需要使用硬件加速让 ffmpeg 解码我的视频(例如 h264)。我正在使用解码帧的常用方法:读取数据包-> 解码帧。我想让 ffmpeg 加速解码。所以我用--enable-vaapiand构建了它--enable-hwaccel=h264。但我真的不知道接下来我该怎么办。我试过使用,avcodec_find_decoder_by_name("h264_vaapi")但它返回 nullptr。无论如何,我可能想使用其他 API 而不仅仅是 VA API。应该如何加速ffmpeg解码?

P.S. I didn't find any examples on Internet which uses ffmpeg with hwaccel.

PS 我没有在互联网上找到任何使用 ffmpeg 和 hwaccel 的例子。

采纳答案by ixSci

After some investigation I was able to implement the necessary HW accelerated decoding on OS X (VDA) and Linux (VDPAU). I will update the answer when I get my hands on Windows implementation as well. So let's start with the easiest:

经过一番调查,我能够在 OS X (VDA) 和 Linux (VDPAU) 上实现必要的硬件加速解码。当我接触 Windows 实现时,我也会更新答案。所以让我们从最简单的开始:

Mac OS X

Mac OS X

To get HW acceleration working on Mac OS you should just use the following: avcodec_find_decoder_by_name("h264_vda");Note, however that you can accelerate h264 videos only on Mac OS with FFmpeg.

要在 Mac OS 上运行硬件加速,您应该只使用以下内容: avcodec_find_decoder_by_name("h264_vda");但是请注意,您只能在 Mac OS 上使用 FFmpeg 加速 h264 视频。

Linux VDPAU

Linux VDPAU

On Linux things are much more complicated(who is surprised?). FFmpeg has 2 HW accelerators on Linux: VDPAU(Nvidia) and VAAPI(Intel) and only one HW decoder: for VDPAU. And it may seems perfectly reasonable to use vdpau decoder like in the Mac OS example above: avcodec_find_decoder_by_name("h264_vdpau");

在 Linux 上,事情要复杂得多(谁感到惊讶?)。FFmpeg 在 Linux 上有 2 个硬件加速器:VDPAU(Nvidia) 和 VAAPI(Intel),只有一个硬件解码器:用于 VDPAU。像上面的 Mac OS 示例一样使用 vdpau 解码器似乎是完全合理的: avcodec_find_decoder_by_name("h264_vdpau");

You might be surprised to find out that it doesn't change anything and you have no acceleration at all. That's because it is only the beginning, you have to write much more code to get the acceleration working. Happily, you don't have to come up with a solution on your own: there are at least 2 good examples of how to achieve that: libavgand FFmpeg itself. libavg has VDPAUDecoder class which is perfectly clear and which I've based my implementation on. You can also consult ffmpeg_vdpau.cto get another implementation to compare. In my opinion the libavg implementation is easier to grasp, though.

您可能会惊讶地发现它没有改变任何东西并且您根本没有加速。那是因为这只是开始,您必须编写更多代码才能使加速工作。令人高兴的是,您不必自己想出一个解决方案:至少有两个很好的例子来说明如何实现这一点:libavg和 FFmpeg 本身。libavg 有 VDPAUDecoder 类,它非常清晰,我的实现基于它。您还可以查阅ffmpeg_vdpau.c以获得另一个实现进行比较。不过,在我看来,libavg 的实现更容易掌握。

The only things both aforementioned examples lack is proper copying of the decoded frame to the main memory. Both examples uses VdpVideoSurfaceGetBitsYCbCrwhich killed all the performance I gained on my machine. That's why you might want to use the following procedure to extract the data from a GPU:

上述两个示例唯一缺少的是将解码帧正确复制到主存储器。这两个示例都使用了VdpVideoSurfaceGetBitsYCbCr它杀死了我在我的机器上获得的所有性能。这就是您可能想要使用以下过程从 GPU 中提取数据的原因:

bool VdpauDecoder::fillFrameWithData(AVCodecContext* context,
    AVFrame* frame)
{
    VdpauDecoder* vdpauDecoder = static_cast<VdpauDecoder*>(context->opaque);
    VdpOutputSurface surface;
    vdp_output_surface_create(m_VdpDevice, VDP_RGBA_FORMAT_B8G8R8A8, frame->width, frame->height, &surface);
    auto renderState = reinterpret_cast<vdpau_render_state*>(frame->data[0]);
    VdpVideoSurface videoSurface = renderState->surface;

    auto status = vdp_video_mixer_render(vdpauDecoder->m_VdpMixer,
        VDP_INVALID_HANDLE,
        nullptr,
        VDP_VIDEO_MIXER_PICTURE_STRUCTURE_FRAME,
        0, nullptr,
        videoSurface,
        0, nullptr,
        nullptr,
        surface,
        nullptr, nullptr, 0, nullptr);
    if(status == VDP_STATUS_OK)
    {
        auto tmframe = av_frame_alloc();
        tmframe->format = AV_PIX_FMT_BGRA;
        tmframe->width = frame->width;
        tmframe->height = frame->height;
        if(av_frame_get_buffer(tmframe, 32) >= 0)
        {
            VdpStatus status = vdp_output_surface_get_bits_native(surface, nullptr,
                reinterpret_cast<void * const *>(tmframe->data),
                reinterpret_cast<const uint32_t *>(tmframe->linesize));
            if(status == VDP_STATUS_OK && av_frame_copy_props(tmframe, frame) == 0)
            {
                av_frame_unref(frame);
                av_frame_move_ref(frame, tmframe);
                return;
            }
        }
        av_frame_unref(tmframe);
    }
    vdp_output_surface_destroy(surface);
    return 0;
}

While it has some "external" objects used inside you should be able to understand it once you have implemented the "get buffer" part(to which the aforementioned examples are of great help). Also I've used BGRAformat which was more suitable for my needs maybe you will choose another.

虽然它内部使用了一些“外部”对象,但一旦您实现了“获取缓冲区”部分(上述示例对此有很大帮助),您应该能够理解它。此外,我使用了BGRA更适合我需要的格式,也许您会选择另一种格式。

The problem with all of it is that you can't just get it working from FFmpeg you need to understand at least basics of the VDPAU API. And I hope that my answer will aid someone in implementing the HW acceleration on Linux. I've spent much time on it myself before I realized that there is no simple, one-line way of implementing HW accelerated decoding on Linux.

所有这一切的问题在于,您不能仅仅从 FFmpeg 中获得它,您至少需要了解 VDPAU API 的基础知识。我希望我的回答能帮助某人在 Linux 上实现硬件加速。在我意识到在 Linux 上实现硬件加速解码没有简单的单行方法之前,我自己花了很多时间。

Linux VA-API

Linux VA-API

Since my original question was regarding VA-API I can't not leave it unanswered. First of all there is no decoder for VA-API in FFmpeg so avcodec_find_decoder_by_name("h264_vaapi")doesn't make any sense: it is nullptr. I don't know how much harder(or maybe simpler?) is to implement decoding via VA-API since all the examples I've seen were quite intimidating. So I choose not to use VA-API at all and I had to implement the acceleration for an Intel card. Fortunately enough for me, there is a VDPAU library(driver?) which works over VA-API. So you can use VDPAU on Intel cards!

由于我最初的问题是关于 VA-API,我不能不回答它。首先,FFmpegavcodec_find_decoder_by_name("h264_vaapi")中没有VA-API 的解码器,所以没有任何意义:它是nullptr. 我不知道通过 VA-API 实现解码有多难(或者更简单?),因为我看到的所有例子都非常令人生畏。所以我选择根本不使用 VA-API,我不得不为 Intel 卡实现加速。对我来说幸运的是,有一个 VDPAU 库(驱动程序?)可以通过 VA-API 工作。所以你可以在 Intel 卡上使用 VDPAU!

I've used the following linkto setup it on my Ubuntu.

我使用以下链接在我的 Ubuntu 上设置它。

Also, you might want to check the comments to the original question where @Timothy_G also mentioned some links regarding VA-API.

此外,您可能想查看对原始问题的评论,其中 @Timothy_G 还提到了一些有关 VA-API 的链接。