C++ 使用 ffmpeg (libavcodec) 通过 RTP 解码 H264 视频的问题

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3493742/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-28 12:58:16  来源:igfitidea点击:

Problem to Decode H264 video over RTP with ffmpeg (libavcodec)

c++h.264rtplibavcodec

提问by bben

I set profile_idc, level_idc, extradata et extradata_size of AvCodecContext with the profile-level-id et sprop-parameter-set of the SDP.

我使用 SDP 的 profile-level-id et sprop-parameter-set 设置了 AvCodecContext 的 profile_idc、level_idc、extradata 和 extradata_size。

I separate the decoding of Coded Slice, SPS, PPS and NAL_IDR_SLICE packet :

我将 Coded Slice、SPS、PPS 和 NAL_IDR_SLICE 数据包的解码分开:

Init:

在里面:

uint8_t start_sequence[]= {0, 0, 1}; int size= recv(id_de_la_socket,(char*) rtpReceive,65535,0);

uint8_t start_sequence[]= {0, 0, 1}; int size= recv(id_de_la_socket,(char*) rtpReceive,65535,0);

Coded Slice :

编码切片:

char *z = new char[size-16+sizeof(start_sequence)];
    memcpy(z,&start_sequence,sizeof(start_sequence));
    memcpy(z+sizeof(start_sequence),rtpReceive+16,size-16);
    ConsumedBytes = avcodec_decode_video(codecContext,pFrame,&GotPicture,(uint8_t*)z,size-16+sizeof(start_sequence));
    delete z;

Result: ConsumedBytes >0 and GotPicture >0 (often)

结果:ConsumedBytes >0 和 GotPicture >0(经常)

SPS and PPS :

SPS 和 PPS :

identical code. Result: ConsumedBytes >0 and GotPicture =0

相同的代码。结果:ConsumedBytes >0 和 GotPicture =0

It's normal I think

我觉得很正常

When I find a new couple SPS/PPS, I update extradata and extrada_size with the payloads of this packet and their size.

当我找到一对新的 SPS/PPS 时,我会使用此数据包的有效载荷及其大小更新 extradata 和 extrada_size。

NAL_IDR_SLICE :

NAL_IDR_SLICE :

The Nal unit type is 28 =>idr Frame are fragmented therefor I tryed two method to decode

Nal 单元类型为 28 => idr 帧被分段,因此我尝试了两种解码方法

1) I prefix the first fragment (without RTP header) with the sequence 0x000001 and send it to avcodec_decode_video. Then I send the rest of fragments to this function.

1)我在第一个片段(没有 RTP 标头)前面加上序列 0x000001 并将其发送到 avcodec_decode_video。然后我将其余的片段发送到这个函数。

2) I prefix the first fragment (without RTP header) with the sequence 0x000001 and concatenate the rest of fragments to it. I send this buffer to decoder.

2) 我在第一个片段(没有 RTP 标头)前面加上序列 0x000001 并将其余片段连接到它。我将此缓冲区发送到解码器。

In both cases, I have no error (ConsumedBytes >0) but I detect no frame (GotPicture = 0) ...

在这两种情况下,我都没有错误(ConsumedBytes > 0)但我没有检测到帧(GotPicture = 0)......

What is the problem ?

问题是什么 ?

回答by Cipi

In RTP all H264 I-Frames (IDRs) are usualy fragmented. When you receive RTP you first must skip the header (usualy first 12 bytes) and then get to the NAL unit (first payload byte). If the NAL is 28 (1C) then it means that following payload represents one H264 IDR (I-Frame) fragment and that you need to collect all of them to reconstruct H264 IDR (I-Frame).

在 RTP 中,所有 H264 I 帧 (IDR) 通常都是分段的。当您收到 RTP 时,您首先必须跳过标头(通常是前 12 个字节),然后到达 NAL 单元(第一个有效载荷字节)。如果 NAL 为 28 (1C),则意味着以下有效负载代表一个 H264 IDR(I 帧)片段,您需要收集所有这些片段以重建 H264 IDR(I 帧)。

Fragmentation occurs because of the limited MTU, and much larger IDR. One fragment can look like this:

由于有限的 MTU 和更大的 IDR,会出现碎片。一个片段可能如下所示:

Fragment that has START BIT = 1:

具有 START BIT = 1 的片段:

First byte:  [ 3 NAL UNIT BITS | 5 FRAGMENT TYPE BITS] 
Second byte: [ START BIT | END BIT | RESERVED BIT | 5 NAL UNIT BITS] 
Other bytes: [... IDR FRAGMENT DATA...]

Other fragments:

其他片段:

First byte:  [ 3 NAL UNIT BITS | 5 FRAGMENT TYPE BITS]  
Other bytes: [... IDR FRAGMENT DATA...]

To reconstruct IDR you must collect this info:

要重建 IDR,您必须收集以下信息:

int fragment_type = Data[0] & 0x1F;
int nal_type = Data[1] & 0x1F;
int start_bit = Data[1] & 0x80;
int end_bit = Data[1] & 0x40;

If fragment_type == 28then payload following it is one fragment of IDR. Next check is start_bitset, if it is, then that fragment is the first one in a sequence. You use it to reconstruct IDR's NAL byte by taking the first 3 bits from first payload byte (3 NAL UNIT BITS)and combine them with last 5 bits from second payload byte (5 NAL UNIT BITS)so you would get a byte like this [3 NAL UNIT BITS | 5 NAL UNIT BITS]. Then write that NAL byte first into a clear buffer with all other following bytes from that fragment. Remember to skip first byte in a sequence since it is not a part of IDR, but only identifies the fragment.

如果fragment_type == 28随后的有效载荷是 IDR 的一个片段。start_bit设置下一个检查,如果是,则该片段是序列中的第一个。您可以使用它来重建 IDR 的 NAL 字节,方法是从第一个有效载荷字节中取出前 3 位(3 NAL UNIT BITS)并将它们与第二个有效载荷字节中的最后 5 位组合起来,(5 NAL UNIT BITS)这样您就可以得到这样的字节[3 NAL UNIT BITS | 5 NAL UNIT BITS]。然后首先将该 NAL 字节与该片段中的所有其他后续字节一起写入一个清除缓冲区。请记住跳过序列中的第一个字节,因为它不是 IDR 的一部分,而只是标识片段。

If start_bitand end_bitare 0 then just write the payload (skipping first payload byte that identifies the fragment) to the buffer.

如果start_bitend_bit为 0,则只需将有效负载(跳过标识片段的第一个有效负载字节)写入缓冲区。

If start_bit is 0 and end_bit is 1, that means that it is the last fragment, and you just write its payload (skipping the first byte that identifies the fragment) to the buffer, and now you have your IDR reconstructed.

如果 start_bit 为 0 且 end_bit 为 1,则意味着它是最后一个片段,您只需将其有效负载(跳过标识该片段的第一个字节)写入缓冲区,现在您的 IDR 已重建。

If you need some code, just ask in comment, I'll post it, but I think this is pretty clear how to do... =)

如果您需要一些代码,请在评论中提问,我会发布它,但我认为这很清楚该怎么做... =)

CONCERNING THE DECODING

关于解码

It crossed my mind today why you get error on decoding the IDR (I presumed that you have reconstructed it good). How are you building your AVC Decoder Configuration Record? Does the lib that you use have that automated? If not, and you havent heard of this, continue reading...

今天我想到了为什么你在解码 IDR 时出错(我认为你已经很好地重建了它)。您如何构建您的 AVC 解码器配置记录?您使用的库是否具有自动化功能?如果没有,并且您还没有听说过,请继续阅读...

AVCDCR is specified to allow decoders to quickly parse all the data they need to decode H264 (AVC) video stream. And the data is following:

AVCDCR 被指定为允许解码器快速解析他们解码 H264 (AVC) 视频流所需的所有数据。数据如下:

  • ProfileIDC
  • ProfileIOP
  • LevelIDC
  • SPS (Sequence Parameter Sets)
  • PPS (Picture Parameter Sets)
  • 简介IDC
  • 个人资料IOP
  • 水平数据中心
  • SPS(序列参数集)
  • PPS(图片参数集)

All this data is sent in RTSP session in SDP under the fields: profile-level-idand sprop-parameter-sets.

所有这些数据都在 SDP 中的 RTSP 会话中发送,位于以下字段:profile-level-idsprop-parameter-sets

DECODING PROFILE-LEVEL-ID

解码配置文件级别 ID

Prifile level ID string is divided into 3 substrings, each 2 characters long:

权限级别 ID 字符串分为 3 个子字符串,每个子字符串长 2 个字符:

[PROFILE IDC][PROFILE IOP][LEVEL IDC]

[PROFILE IDC][PROFILE IOP][LEVEL IDC]

Each substring represents one byte in base16! So, if Profile IDC is 28, that means it is actualy 40 in base10. Later you will use base10 values to construct AVC Decoder Configuration Record.

每个子字符串代表base16 中的一个字节!所以,如果 Profile IDC 是 28,那意味着它实际上是 base10 中的 40。稍后您将使用 base10 值来构建 AVC 解码器配置记录。

DECODING SPROP-PARAMETER-SETS

解码 Sprop 参数集

Sprops are usualy 2 strings (could be more) that are comma separated, and base64 encoded! You can decode both of them but there is no need to. Your job here is just to convert them from base64 string into byte array for later use. Now you have 2 byte arrays, first array us SPS, second one is PPS.

Sprops 通常是 2 个以逗号分隔的字符串(可能更多),并且是base64 编码的! 您可以解码它们,但没有必要。您在这里的工作只是将它们从 base64 字符串转换为字节数组以备后用。现在你有 2 个字节的数组,第一个数组是 SPS,第二个是 PPS。

BUILDING THE AVCDCR

构建 AVCDCR

Now, you have all you need to build AVCDCR, you start by making new clean buffer, now write these things in it in the order explained here:

现在,您拥有构建 AVCDCR 所需的一切,首先创建新的干净缓冲区,现在按照此处解释的顺序将这些内容写入其中:

1 - Byte that has value 1and represents version

1 - 值为1并代表版本的字节

2 - Profile IDC byte

2 - 配置文件 IDC 字节

3 - Prifile IOP byte

3 - 优先级 IOP 字节

4 - Level IDC byte

4 - 级别 IDC 字节

5 - Byte with value 0xFF (google the AVC Decoder Configuration Record to see what this is)

5 - 值为 0xFF 的字节(谷歌 AVC 解码器配置记录以查看这是什么)

6 - Byte with value 0xE1

6 - 值为 0xE1 的字节

7 - Short with value of the SPS array length

7 - 带有 SPS 数组长度值的 Short

8 - SPS byte array

8 - SPS 字节数组

9 - Byte with the number of PPS arrays (you could have more of them in sprop-parameter-set)

9 - 带有 PPS 数组数量的字节(您可以在 sprop-parameter-set 中拥有更多)

10 - Short with the length of following PPS array

10 - 短于以下 PPS 数组的长度

11 - PPS array

11 - PPS 阵列

DECODING VIDEO STREAM

解码视频流

Now you have byte array that tells the decoder how to decode H264 video stream. I believe that you need this if your lib doesn't build it itself from SDP...

现在你有了告诉解码器如何解码 H264 视频流的字节数组。我相信如果你的库不是从 SDP 构建它自己,你需要这个......

回答by Scott

I don't know about the rest of your implementation, but it seems likely the 'fragments' you are receiving are NAL units. Therefore each, each may need the the NALU start-code (00 00 01or 00 00 00 01) appended when you reconstruct the bitstream before sending it to ffmpeg.

我不知道您的其余实现,但您收到的“片段”似乎很可能是 NAL 单元。因此,当您在将比特流发送到 ffmpeg 之前重建比特流时,每个可能都需要附加NALU 起始代码(00 00 0100 00 00 01)。

At any rate, you might find the RFC for H264 RTP packetization useful:

无论如何,您可能会发现 H264 RTP 打包的 RFC 很有用:

http://www.rfc-editor.org/rfc/rfc3984.txt

http://www.rfc-editor.org/rfc/rfc3984.txt

Hope this helps!

希望这可以帮助!

回答by Jay

I have an implementation of this @ https://net7mma.codeplex.com/for c# but the process is the same everywhere.

我有一个 @ https://net7mma.codeplex.com/for c#的实现,但过程在任何地方都是一样的。

Here is the relevant code

这是相关的代码

/// <summary>
    /// Implements Packetization and Depacketization of packets defined in <see href="https://tools.ietf.org/html/rfc6184">RFC6184</see>.
    /// </summary>
    public class RFC6184Frame : Rtp.RtpFrame
    {
        /// <summary>
        /// Emulation Prevention
        /// </summary>
        static byte[] NalStart = { 0x00, 0x00, 0x01 };

        public RFC6184Frame(byte payloadType) : base(payloadType) { }

        public RFC6184Frame(Rtp.RtpFrame existing) : base(existing) { }

        public RFC6184Frame(RFC6184Frame f) : this((Rtp.RtpFrame)f) { Buffer = f.Buffer; }

        public System.IO.MemoryStream Buffer { get; set; }

        /// <summary>
        /// Creates any <see cref="Rtp.RtpPacket"/>'s required for the given nal
        /// </summary>
        /// <param name="nal">The nal</param>
        /// <param name="mtu">The mtu</param>
        public virtual void Packetize(byte[] nal, int mtu = 1500)
        {
            if (nal == null) return;

            int nalLength = nal.Length;

            int offset = 0;

            if (nalLength >= mtu)
            {
                //Make a Fragment Indicator with start bit
                byte[] FUI = new byte[] { (byte)(1 << 7), 0x00 };

                bool marker = false;

                while (offset < nalLength)
                {
                    //Set the end bit if no more data remains
                    if (offset + mtu > nalLength)
                    {
                        FUI[0] |= (byte)(1 << 6);
                        marker = true;
                    }
                    else if (offset > 0) //For packets other than the start
                    {
                        //No Start, No End
                        FUI[0] = 0;
                    }

                    //Add the packet
                    Add(new Rtp.RtpPacket(2, false, false, marker, PayloadTypeByte, 0, SynchronizationSourceIdentifier, HighestSequenceNumber + 1, 0, FUI.Concat(nal.Skip(offset).Take(mtu)).ToArray()));

                    //Move the offset
                    offset += mtu;
                }
            } //Should check for first byte to be 1 - 23?
            else Add(new Rtp.RtpPacket(2, false, false, true, PayloadTypeByte, 0, SynchronizationSourceIdentifier, HighestSequenceNumber + 1, 0, nal));
        }

        /// <summary>
        /// Creates <see cref="Buffer"/> with a H.264 RBSP from the contained packets
        /// </summary>
        public virtual void Depacketize() { bool sps, pps, sei, slice, idr; Depacketize(out sps, out pps, out sei, out slice, out idr); }

        /// <summary>
        /// Parses all contained packets and writes any contained Nal Units in the RBSP to <see cref="Buffer"/>.
        /// </summary>
        /// <param name="containsSps">Indicates if a Sequence Parameter Set was found</param>
        /// <param name="containsPps">Indicates if a Picture Parameter Set was found</param>
        /// <param name="containsSei">Indicates if Supplementatal Encoder Information was found</param>
        /// <param name="containsSlice">Indicates if a Slice was found</param>
        /// <param name="isIdr">Indicates if a IDR Slice was found</param>
        public virtual void Depacketize(out bool containsSps, out bool containsPps, out bool containsSei, out bool containsSlice, out bool isIdr)
        {
            containsSps = containsPps = containsSei = containsSlice = isIdr = false;

            DisposeBuffer();

            this.Buffer = new MemoryStream();

            //Get all packets in the frame
            foreach (Rtp.RtpPacket packet in m_Packets.Values.Distinct()) 
                ProcessPacket(packet, out containsSps, out containsPps, out containsSei, out containsSlice, out isIdr);

            //Order by DON?
            this.Buffer.Position = 0;
        }

        /// <summary>
        /// Depacketizes a single packet.
        /// </summary>
        /// <param name="packet"></param>
        /// <param name="containsSps"></param>
        /// <param name="containsPps"></param>
        /// <param name="containsSei"></param>
        /// <param name="containsSlice"></param>
        /// <param name="isIdr"></param>
        internal protected virtual void ProcessPacket(Rtp.RtpPacket packet, out bool containsSps, out bool containsPps, out bool containsSei, out bool containsSlice, out bool isIdr)
        {
            containsSps = containsPps = containsSei = containsSlice = isIdr = false;

            //Starting at offset 0
            int offset = 0;

            //Obtain the data of the packet (without source list or padding)
            byte[] packetData = packet.Coefficients.ToArray();

            //Cache the length
            int count = packetData.Length;

            //Must have at least 2 bytes
            if (count <= 2) return;

            //Determine if the forbidden bit is set and the type of nal from the first byte
            byte firstByte = packetData[offset];

            //bool forbiddenZeroBit = ((firstByte & 0x80) >> 7) != 0;

            byte nalUnitType = (byte)(firstByte & Common.Binary.FiveBitMaxValue);

            //o  The F bit MUST be cleared if all F bits of the aggregated NAL units are zero; otherwise, it MUST be set.
            //if (forbiddenZeroBit && nalUnitType <= 23 && nalUnitType > 29) throw new InvalidOperationException("Forbidden Zero Bit is Set.");

            //Determine what to do
            switch (nalUnitType)
            {
                //Reserved - Ignore
                case 0:
                case 30:
                case 31:
                    {
                        return;
                    }
                case 24: //STAP - A
                case 25: //STAP - B
                case 26: //MTAP - 16
                case 27: //MTAP - 24
                    {
                        //Move to Nal Data
                        ++offset;

                        //Todo Determine if need to Order by DON first.
                        //EAT DON for ALL BUT STAP - A
                        if (nalUnitType != 24) offset += 2;

                        //Consume the rest of the data from the packet
                        while (offset < count)
                        {
                            //Determine the nal unit size which does not include the nal header
                            int tmp_nal_size = Common.Binary.Read16(packetData, offset, BitConverter.IsLittleEndian);
                            offset += 2;

                            //If the nal had data then write it
                            if (tmp_nal_size > 0)
                            {
                                //For DOND and TSOFFSET
                                switch (nalUnitType)
                                {
                                    case 25:// MTAP - 16
                                        {
                                            //SKIP DOND and TSOFFSET
                                            offset += 3;
                                            goto default;
                                        }
                                    case 26:// MTAP - 24
                                        {
                                            //SKIP DOND and TSOFFSET
                                            offset += 4;
                                            goto default;
                                        }
                                    default:
                                        {
                                            //Read the nal header but don't move the offset
                                            byte nalHeader = (byte)(packetData[offset] & Common.Binary.FiveBitMaxValue);

                                            if (nalHeader > 5)
                                            {
                                                if (nalHeader == 6)
                                                {
                                                    Buffer.WriteByte(0);
                                                    containsSei = true;
                                                }
                                                else if (nalHeader == 7)
                                                {
                                                    Buffer.WriteByte(0);
                                                    containsPps = true;
                                                }
                                                else if (nalHeader == 8)
                                                {
                                                    Buffer.WriteByte(0);
                                                    containsSps = true;
                                                }
                                            }

                                            if (nalHeader == 1) containsSlice = true;

                                            if (nalHeader == 5) isIdr = true;

                                            //Done reading
                                            break;
                                        }
                                }

                                //Write the start code
                                Buffer.Write(NalStart, 0, 3);

                                //Write the nal header and data
                                Buffer.Write(packetData, offset, tmp_nal_size);

                                //Move the offset past the nal
                                offset += tmp_nal_size;
                            }
                        }

                        return;
                    }
                case 28: //FU - A
                case 29: //FU - B
                    {
                        /*
                         Informative note: When an FU-A occurs in interleaved mode, it
                         always follows an FU-B, which sets its DON.
                         * Informative note: If a transmitter wants to encapsulate a single
                          NAL unit per packet and transmit packets out of their decoding
                          order, STAP-B packet type can be used.
                         */
                        //Need 2 bytes
                        if (count > 2)
                        {
                            //Read the Header
                            byte FUHeader = packetData[++offset];

                            bool Start = ((FUHeader & 0x80) >> 7) > 0;

                            //bool End = ((FUHeader & 0x40) >> 6) > 0;

                            //bool Receiver = (FUHeader & 0x20) != 0;

                            //if (Receiver) throw new InvalidOperationException("Receiver Bit Set");

                            //Move to data
                            ++offset;

                            //Todo Determine if need to Order by DON first.
                            //DON Present in FU - B
                            if (nalUnitType == 29) offset += 2;

                            //Determine the fragment size
                            int fragment_size = count - offset;

                            //If the size was valid
                            if (fragment_size > 0)
                            {
                                //If the start bit was set
                                if (Start)
                                {
                                    //Reconstruct the nal header
                                    //Use the first 3 bits of the first byte and last 5 bites of the FU Header
                                    byte nalHeader = (byte)((firstByte & 0xE0) | (FUHeader & Common.Binary.FiveBitMaxValue));

                                    //Could have been SPS / PPS / SEI
                                    if (nalHeader > 5)
                                    {
                                        if (nalHeader == 6)
                                        {
                                            Buffer.WriteByte(0);
                                            containsSei = true;
                                        }
                                        else if (nalHeader == 7)
                                        {
                                            Buffer.WriteByte(0);
                                            containsPps = true;
                                        }
                                        else if (nalHeader == 8)
                                        {
                                            Buffer.WriteByte(0);
                                            containsSps = true;
                                        }
                                    }

                                    if (nalHeader == 1) containsSlice = true;

                                    if (nalHeader == 5) isIdr = true;

                                    //Write the start code
                                    Buffer.Write(NalStart, 0, 3);

                                    //Write the re-construced header
                                    Buffer.WriteByte(nalHeader);
                                }

                                //Write the data of the fragment.
                                Buffer.Write(packetData, offset, fragment_size);
                            }
                        }
                        return;
                    }
                default:
                    {
                        // 6 SEI, 7 and 8 are SPS and PPS
                        if (nalUnitType > 5)
                        {
                            if (nalUnitType == 6)
                            {
                                Buffer.WriteByte(0);
                                containsSei = true;
                            }
                            else if (nalUnitType == 7)
                            {
                                Buffer.WriteByte(0);
                                containsPps = true;
                            }
                            else if (nalUnitType == 8)
                            {
                                Buffer.WriteByte(0);
                                containsSps = true;
                            }
                        }

                        if (nalUnitType == 1) containsSlice = true;

                        if (nalUnitType == 5) isIdr = true;

                        //Write the start code
                        Buffer.Write(NalStart, 0, 3);

                        //Write the nal heaer and data data
                        Buffer.Write(packetData, offset, count - offset);

                        return;
                    }
            }
        }

        internal void DisposeBuffer()
        {
            if (Buffer != null)
            {
                Buffer.Dispose();
                Buffer = null;
            }
        }

        public override void Dispose()
        {
            if (Disposed) return;
            base.Dispose();
            DisposeBuffer();
        }

        //To go to an Image...
        //Look for a SliceHeader in the Buffer
        //Decode Macroblocks in Slice
        //Convert Yuv to Rgb
    }

There are also implementations for various other RFC's which help getting the media to play in a MediaElement or in other software or just saving it to disk.

还有各种其他 RFC 的实现,它们有助于让媒体在 MediaElement 或其他软件中播放,或者只是将其保存到磁盘。

Writing to a container format is underway.

正在写入容器格式。