The h.264 Sequence Parameter Set

Ben Mesander
April 20, 2011 by Ben Mesander

This is a follow-up to my World’s Smallest h.264 Encoder post. I’ve received several emails asking about precise details of things in two entities in the h.264 bitstream: the Sequence Parameter Set (SPS) and the Picture Parameter Set (PPS). Both entities contain information that an h.264 decoder needs to decode the video data, for example the resolution and frame rate of the video.

Recall that an h.264 bitstream contains a sequence of Network Abstraction Layer (NAL) units. The SPS and PPS are both types of NAL units. The SPS NAL unit contains parameters that apply to a series of consecutive coded video pictures, referred to as a “coded video sequence” in the h.264 standard. The PPS NAL unit contains parameters that apply to the decoding of one or more individual pictures inside a coded video sequence.

In the case of my simple encoder, we emitted a single SPS and PPS at the start of the video data stream, but in the case of a more complex encoder, it would not be uncommon to see them inserted periodically in the data for two reasons—first, often a decoder will need to start decoding mid-stream, and second, because the encoder may wish to vary parameters for different parts of the stream in order to achieve better compression or quality goals.

In my trivial encoder, the h.264 SPS and PPS were hardcoded in hex as:

/* h.264 bitstreams */
const uint8_t sps[] =
{0x00, 0x00, 0x00, 0x01, 0x67, 0x42, 0x00, 0x0a, 0xf8, 0x41, 0xa2};
const uint8_t pps[] =
{0x00, 0x00, 0x00, 0x01, 0x68, 0xce, 0x38, 0x80};

Let’s decode this into something readable from the spec. The first thing I did was to look at section 7 of the h.264 specification. I saw that at a minimum I had to choose how to fill in the SPS parameters in the table below. In the table, as in the standard, the type u(n) indicates an unsigned integer of n bits, and ue(v) indicates an unsigned exponential-golomb coded value of a variable number of bits. The spec doesn’t seem to define the maximum number of bits anywhere, but the reference encoder software uses 32. (People wishing to explore the security of decoder software may find it interesting to violate this assumption!)

Parameter Name Type Value Comments
forbidden_zero_bit u(1) 0 Despite being forbidden, it must be set to 0!
nal_ref_idc u(2) 3 3 means it is “important” (this is an SPS)
nal_unit_type u(5) 7 Indicates this is a sequence parameter set
profile_idc u(8) 66 Baseline profile
constraint_set0_flag u(1) 0 We’re not going to honor constraints
constraint_set1_flag u(1) 0 We’re not going to honor constraints
constraint_set2_flag u(1) 0 We’re not going to honor constraints
constraint_set3_flag u(1) 0 We’re not going to honor constraints
reserved_zero_4bits u(4) 0 Better set them to zero
level_idc u(8) 10 Level 1, sec A.3.1
seq_parameter_set_id ue(v) 0 We’ll just use id 0.
log2_max_frame_num_minus4 ue(v) 0 Let’s have as few frame numbers as possible
pic_order_cnt_type ue(v) 0 Keep things simple
log2_max_pic_order_cnt_lsb_minus4 ue(v) 0 Fewer is better.
num_ref_frames ue(v) 0 We will only send I slices
gaps_in_frame_num_value_allowed_flag u(1) 0 We will have no gaps
pic_width_in_mbs_minus_1 ue(v) 7 SQCIF is 8 macroblocks wide
pic_height_in_map_units_minus_1 ue(v) 5 SQCIF is 6 macroblocks high
frame_mbs_only_flag u(1) 1 We will not to field/frame encoding
direct_8x8_inference_flag u(1) 0 Used for B slices. We will not send B slices
frame_cropping_flag u(1) 0 We will not do frame cropping
vui_prameters_present_flag u(1) 0 We will not send VUI data
rbsp_stop_one_bit u(1) 1 Stop bit. I missed this at first and it caused me much trouble.

Some key things here are the profile (profile_idc) and level (level_idc) that I chose, and the picture width and height. If you encode the above table in hex, you will get the values in the SPS array declared above.

A question I got a couple of times in email was about the width and height parameters—specifically, what to do if the picture width or height is not an integer multiple of macroblock size. Recall that, for the 4:2:0 sampling scheme in my encoder, a macroblock consists of 16×16 luma samples. In this case, you would set the frame_cropping_flag to 1, and reduce the number of pixels in the horizontal and vertical direction with the frame_crop_left_offset, frame_crop_right_offset, frame_crop_top_offset, and frame_crop_bottom_offset parameters, which are conditionally present in the bitstream only if the frame_cropping_flag is set to one.

One interesting problem that we see fairly often with h.264 is when the container format (MP4, MOV, etc.) contains different values for some of these parameters than the SPS and PPS. In this case, we find different video players handle the streams differently.

A handy tool for decoding h.264 bitstreams, including the SPS, is the h264bitstream tool. It comes with a command line program that decodes a bitstream to the parameter names defined in the h.264 specification. Let’s look at its output for a sample mp4 file I downloaded from youtube. First, I extract the h.264 NAL units from the file using ffmpeg:

ffmpeg.exe -i Old Faithful.mp4 -vcodec copy -vbsf h264_mp4toannexb -an of.h264

The NAL units now reside in the file of.h264. I then run the h264_analyze command from the h264bitstream package to produce the following output:

h264_analyze of.h264
!! Found NAL at offset 4 (0x0004), size 25 (0x0019)
==================== NAL ====================
forbidden_zero_bit : 0
nal_ref_idc : 3
nal_unit_type : 7 ( Sequence parameter set )
======= SPS =======
profile_idc : 100
constraint_set0_flag : 0
constraint_set1_flag : 0
constraint_set2_flag : 0
constraint_set3_flag : 0
reserved_zero_4bits : 0
level_idc : 31
seq_parameter_set_id : 0
chroma_format_idc : 1
residual_colour_transform_flag : 0
bit_depth_luma_minus8 : 0
bit_depth_chroma_minus8 : 0
qpprime_y_zero_transform_bypass_flag : 0
seq_scaling_matrix_present_flag : 0
log2_max_frame_num_minus4 : 3
pic_order_cnt_type : 0
log2_max_pic_order_cnt_lsb_minus4 : 3
delta_pic_order_always_zero_flag : 0
offset_for_non_ref_pic : 0
offset_for_top_to_bottom_field : 0
num_ref_frames_in_pic_order_cnt_cycle : 0
num_ref_frames : 1
gaps_in_frame_num_value_allowed_flag : 0
pic_width_in_mbs_minus1 : 79
pic_height_in_map_units_minus1 : 44
frame_mbs_only_flag : 1
mb_adaptive_frame_field_flag : 0
direct_8x8_inference_flag : 1
frame_cropping_flag : 0
frame_crop_left_offset : 0
frame_crop_right_offset : 0
frame_crop_top_offset : 0
frame_crop_bottom_offset : 0
vui_parameters_present_flag : 1
=== VUI ===
aspect_ratio_info_present_flag : 1
aspect_ratio_idc : 1
sar_width : 0
sar_height : 0
overscan_info_present_flag : 0
overscan_appropriate_flag : 0
video_signal_type_present_flag : 0
video_signal_type_present_flag : 0
video_format : 0
video_full_range_flag : 0
colour_description_present_flag : 0
colour_primaries : 0
transfer_characteristics : 0
matrix_coefficients : 0
chroma_loc_info_present_flag : 0
chroma_sample_loc_type_top_field : 0
chroma_sample_loc_type_bottom_field : 0
timing_info_present_flag : 1
num_units_in_tick : 100
time_scale : 5994
fixed_frame_rate_flag : 1
nal_hrd_parameters_present_flag : 0
vcl_hrd_parameters_present_flag : 0
low_delay_hrd_flag : 0
pic_struct_present_flag : 0
bitstream_restriction_flag : 1
motion_vectors_over_pic_boundaries_flag : 1
max_bytes_per_pic_denom : 0
max_bits_per_mb_denom : 0
log2_max_mv_length_horizontal : 11
log2_max_mv_length_vertical : 11
num_reorder_frames : 0
max_dec_frame_buffering : 1
=== HRD ===
cpb_cnt_minus1 : 0
bit_rate_scale : 0
cpb_size_scale : 0
initial_cpb_removal_delay_length_minus1 : 0
cpb_removal_delay_length_minus1 : 0
dpb_output_delay_length_minus1 : 0
time_offset_length : 0

The only additional thing I’d like to point out here is that this particular SPS also contains information about the frame rate of the video (see timing_info_present_flag). These parameters must be closely checked when you generate bitstreams to ensure they agree with the container format that the h.264 will eventually be muxed into. Even a small error, such as 29.97 fps in one place and 30 fps in another, can result in severe audio/video synchronization problems.

Next time I will write about the h.264 Picture Parameter Set (PPS).

Share the knowledge...
Share on FacebookTweet about this on TwitterShare on LinkedInShare on RedditShare on Google+Email this to someone
This entry was posted on Wednesday, April 20th, 2011 at 11:20 am in Ben, Video.
Tags: ,

10 Responses to “The h.264 Sequence Parameter Set”

May 8th, 2011 | Erez Semaria

I’ve been trying to work with h264 using DirectX
What I would like to do is to take a stream of NAL units over rtp. I can use FFMPEG to decode this and I shall end up with my PPS and SPS but my DirectX api requires Bitstream, Quantization Matrix, Slice Information, target control and picture parameters (I imagine that this last one is at least tangentially related to PPS)
Do you know any way to bridge the gap between the spec and what DirectX requires of me

May 10th, 2011 | Ben Mesander

Hi Erez,

The information you need will be in the PPS, SPS, and the picture slices themselves. You will have to either extract the information yourself by parsing the H.264 bitstream, or you will have to use ffmpeg’s parser.


May 27th, 2011 | Steve Huh

Dear Ben:

First and foremost, thanks for the smallest H264 encoder source code! 🙂
Currently I am trying to work on generating H.264 video based on BMP files (128 X 96) using your encoder source code.

Here are the things my application does.
(1) The application extracts an array of RGB from the BMP file,
(2) convert the RGB data to YUV420p data,
(3) and saving the data using your code.

However, the application generates video which is flipped upside down; also Cr and Cb values are exchanged in upper 1/3 video file.
Could you possibly tell me what part of my source code is causing the problem? The development environment is VS2010 using C++.

Steve, Huh

June 1st, 2011 | Ben Mesander

Hi Steve,

Well, obviously the YUV data you are providing do not seem to be in the same order the encoder expects. I had the same problem while writing the code.

What I did was write some C programs to generate some simple YUV test images. I found ramping luma and chroma and checkerboards of 255/0 values to be very useful for debugging this. You can then make a movie of the ramp or the checkerboard and then take a screenshot of the movie and look at the YUV pixel values in the gimp or some other graphic editor and figure out how your input maps to the output.

Hope this helps,

June 14th, 2011 | Rindra

Thank you for the smallest H.264 encoder. It is very interesting and very useful and I wonder if you could help me in how using it as real-time video encoding. I explain, I grab raw frames from camera and I want to stream it over RTP or HTTP to a streaming server so I need to encode each raw frame to H.264,isnt’it. Could you help me in how using the smallest H.264 encoder for this real-time raw frame en oding (not using inputstream or file).
Thank you in advance. Thanks a lot for all :).

June 14th, 2011 | Ben Mesander

Hi Rindra,

The encoder works on individual planar YUV 4:2:0 images and produces individual I frame output NALs. You will have to study the relevant RFCs for how to further format the NAL units for RTP / HTTP delivery. It should be relatively simple to modify it to acquire the YUV data from some camera instead of a file and to write to a memory buffer or whatever instead of a file.


July 5th, 2011 | Hong

Hello Ben,

Thanks a lot for this elucidatng post.

I look forward to reading your post on h.264 Picture Parameter Set which, I assume, is yet to be published.


December 9th, 2011 | Victor

Hi Ben,

Thanks for the very helpful post. I’m going over each item, and also have the H.264 spec next to me. It seems that in your table, “reserved_zero_4bits”, should actually be “reserved_zero_5bits” according to the H.264 spec. Is this just a typo? Or do I have an outdated spec? I’m fairly new to these things. I’m guessing changing something like this would be huge and would break a lot of things out there. So it doesn’t make sense that 4 bits of zeros became 5 bits.

I checked two copies of the spec, and the both say 5 bits. One is ITU-T H.264 (05/2003) and the other is ISO/IEC 14496-10 Second edition (2004-10-01)

Thanks again!

December 2nd, 2014 | Rajendra


I am trying to calculate the frame rate from raw H.264 Video stream.
If the fixed_frame_rate_flag is not set in VUI parameters or timing_information is not present; do we have any other way to find the video frame rate from raw H.264 Video stream to package to a container format.

Please provide your valuable inputs.

Thanks and Regards,

April 25th, 2015 | Matt Johnson

Ben..Could you help us out with this?

Is there a method in FFmpeg that allows us to send new configuration data (SPS/PPS) mid stream over RTMP? avformat_write_header() sends the inital SPS/PPS, but we adapt the video bitrate based on network conditions and so the configuration data changes. We currently inject SPS and PPS in a key frame when new configuration data is available. This works fine when we play the original RTMP stream. The problem is when it is transcoded.

For example: Usually the configuration data starts like this (hex): 17 00 00 00 00 SPS … PPS. 17 means key frame and the following 00 means that the packet contains configuration data. When injecting the configuration data into a key frame’s body, FFmpeg wil always put 01 after 17, meaning picture data : 17 01 00 00 Which is why we think during transcoding the configuration data is lost.

I will also try emailing you.


Post a Reply


New Project on the Horizon? Let's Talk