World’s Smallest h.264 Encoder

Ben Mesander
March 19, 2010 by Ben Mesander

Recently I have been studying the h.264 video codec and reading the ISO spec. h.264 a much more sophisticated codec than MPEG-2, which means that a well-implemented h.264 encoder has more compression tools at its disposal than the equivalent MPEG-2 encoder. But all that sophistication comes at a price: h.264 also has a big, complicated specification with a plethora of options, many of which are not commonly used, and it takes expertise to understand which parts are important to solve a given problem.

As a bit of a parlor trick, I decided to write the simplest possible h.264 encoder. I was able to do it in about 30 lines of code—although truth in advertising compels me to admit that it doesn’t actually compress the video at all!

While I don’t want to balloon this blog post with a detailed description of h.264, a little background is in order. An h.264 stream contains the encoded video data along with various parameters needed by a decoder in order to decode the video data. To structure this data, the bitstream consists of a sequence of Network Abstraction Layer (NAL) units.

Previous MPEG specifications allowed pictures to be coded as I-frames, P-frames, or B-frames. h.264 is more complex and wonderful. It allows individual frames to be coded as multiple slices, each of which can be of type I, P, or B, or even more esoteric types. This feature can be used in creative ways to achieve different video coding goals. In our encoder we will use one slice per frame for simplicity, and we will use all I-frames.

As with previous MPEG specifications, in h.264 each slice consists of one or more 16×16 macroblocks. Each macroblock in our 4:2:0 sampling scheme contains 16×16 luma samples, and two 8×8 blocks of chroma samples. For this simple encoder, I won’t be compressing the video data at all, so the samples will be directly copied into the h.264 output.

With that background in mind, for our simplest possible encoder, there are three NALs we have to emit:

  1. Sequence Parameter Set (SPS): Once per stream
  2. Picture Parameter Set (PPS): Once per stream
  3. Slice Header: Once per video frame
    1. Slice Header information
    2. Macroblock Header: Once per macroblock
    3. Coded Macroblock Data: The actual coded video for the macroblock

Since the SPS, the PPS, and the slice header are static for this application, I was able to hand-code them and include them in my encoder as a sequence of magic bits.

Putting it all together, I came up with the following code for what I call “hello264”:

#include <stdio.h>
#include <stdlib.h>

/* SQCIF */
#define LUMA_WIDTH 128
#define LUMA_HEIGHT 96
#define CHROMA_WIDTH LUMA_WIDTH / 2
#define CHROMA_HEIGHT LUMA_HEIGHT / 2

/* YUV planar data, as written by ffmpeg */
typedef struct
{
uint8_t Y[LUMA_HEIGHT][LUMA_WIDTH];
uint8_t Cb[CHROMA_HEIGHT][CHROMA_WIDTH];
uint8_t Cr[CHROMA_HEIGHT][CHROMA_WIDTH];
} __attribute__((__packed__)) frame_t;

frame_t frame;

/* H.264 bitstreams */
const uint8_t sps[] = { 0x00, 0x00, 0x00, 0x01, 0x67, 0x42, 0x00,
0x0a, 0xf8, 0x41, 0xa2 };
const uint8_t pps[] = { 0x00, 0x00, 0x00, 0x01, 0x68, 0xce,
0x38, 0x80 };
const uint8_t slice_header[] = { 0x00, 0x00, 0x00, 0x01, 0x05, 0x88,
0x84, 0x21, 0xa0 };
const uint8_t macroblock_header[] = { 0x0d, 0x00 };

/* Write a macroblock's worth of YUV data in I_PCM mode */
void macroblock(const int i, const int j)
{
int x, y;

if (! ((i == 0) && (j == 0)))
{
fwrite(&macroblock_header, 1, sizeof(macroblock_header),
stdout);
}

for(x = i*16; x < (i+1)*16; x++)
for (y = j*16; y < (j+1)*16; y++)
fwrite(&frame.Y[x][y], 1, 1, stdout);
for (x = i*8; x < (i+1)*8; x++)
for (y = j*8; y < (j+1)*8; y++)
fwrite(&frame.Cb[x][y], 1, 1, stdout);
for (x = i*8; x < (i+1)*8; x++)
for (y = j*8; y < (j+1)*8; y++)
fwrite(&frame.Cr[x][y], 1, 1, stdout);
}

/* Write out PPS, SPS, and loop over input, writing out I slices */
int main(int argc, char **argv)
{
int i, j;

fwrite(sps, 1, sizeof(sps), stdout);
fwrite(pps, 1, sizeof(pps), stdout);

while (! feof(stdin))
{
fread(&frame, 1, sizeof(frame), stdin);
fwrite(slice_header, 1, sizeof(slice_header), stdout);

for (i = 0; i < LUMA_HEIGHT/16 ; i++)
for (j = 0; j < LUMA_WIDTH/16; j++)
macroblock(i, j);

fputc(0x80, stdout); /* slice stop bit */
}

return 0;
}

(This source code is available as a single file here.)

In main(), the encoder writes out the SPS and PPS. Then it reads YUV data from standard input, stores it in a frame buffer, and then writes out a h.264 slice header. It then loops over each macroblock in the frame and calls the macroblock() function to output a macroblock header indicating the macroblock is coded as I_PCM, and inserts the YUV data.

To use the code, you will need some uncompressed video. To generate this, I used the ffmpeg package to convert a QuickTime movie from my Kodak Zi8 video camera from h.264 to SQCIF (128×96) planar YUV format sampled at 4:2:0:

ffmpeg.exe -i angel.mov -s sqcif -pix_fmt yuv420p angel.yuv

I compile the h.264 encoder:

gcc –Wall –ansi hello264.c –o hello264

And run it:

hello264 <angel.yuv >angel.264

Finally, I use ffmpeg to copy the raw h.264 NAL units into an MP4 file:

ffmpeg.exe -f h264 -i angel.264 -vcodec copy angel.mp4

Here is the resulting output:

There you have it—a complete h.264 encoder that uses minimal CPU cycles, with output larger than its input!

The next thing to add to this encoder would be CAVLC coding of macroblocks and intra prediction. The encoder would still be lossless at this point, but there would start to be compression of data. After that, the next logical step would be quantization to allow lossy compression, and then I would add P slices. As a development methodology, I prefer to bring up a simplistic version of an application, get it running, and then add refinements iteratively.

UPDATE 4/20/11: I’ve written more about the Sequence Parameter Set (SPS) here.

Ben Mesander has more than 18 years of experience leading software development teams and implementing software. His strengths include Linux, C, C++, numerical methods, control systems and digital signal processing. His experience includes embedded software, scientific software and enterprise software development environments.

Share the knowledge...
Share on FacebookTweet about this on TwitterShare on LinkedInShare on RedditShare on Google+Email this to someone
This entry was posted on Friday, March 19th, 2010 at 2:35 pm in Ben, Video.
Tags: ,

22 Responses to “World’s Smallest h.264 Encoder”

March 20th, 2010 | Bull

GRRRRRRR!

needs

#include <stdint.h>

March 20th, 2010 | Chris Alexander

Extremely impressive! A great exercise to get to know h.264 and encoders in general.

March 20th, 2010 | rms

(i+1) ??

June 3rd, 2010 | Subrata Dasgupta

Hi Ben,
This post is great. This post is really a great place to start if any beginner want to understand the h264 internals. I have one specific question. It seems to me that height and width information is in the SPS or PPS nal units. So how can I decode or parse SPS and PPS nals to get height and width info ?? Is there any specific algorithm to parse those nals??

Thanks
Subrata

June 4th, 2010 | Ben Mesander

Hi Subrata,

The SPS does contain the picture width and height, in macroblock units. I use the following utility to take apart H.264 NAL units and look at the values of the various fields:

http://sourceforge.net/projects/h264bitstream/

Hope this helps,
Ben

September 4th, 2011 | Galland

slice stop bit –> actually “RBSP slice trailing bits” occurring after the slice data

September 15th, 2011 | Rahul Jadhav

Thanks for this simple explanation to start off. Really takes the pressure off 🙂
Also the post made me aware of h264bitstream.
Wish to see more of this sort of work(bottom up approach).
Thanks Again.

September 16th, 2011 | Victor

As I see that profile_idc is set to 66, please note that this is not a compliant H.264 encoder. If it finds a 16×16 black square the result will not be valid (look for pcm_sample_luma in spec.’s Appendix A).
It’s another unnecessary quirk of this standard 🙂

Anyways, very interesting work, thanks

September 20th, 2011 | Ben Mesander

Hi Victor,

I was aware that Appendix A disallowed a macroblock’s luma values from being all 0 bits. As it turns out, the YUV data that I use above was captured with a camera that limits luma data to the range 16…235 rather than 0…255, thus the standard is satisfied and so it does not turn out to be a problem that I had to deal with in code.

Regards,
Ben

November 16th, 2011 | Harry

It seems that there is a mistake in slice_header[] – {…0x05…} – nal_ref_idc shall not be equal to 0 for NAL units with nal_unit_type equal 5. So it must be 0x65. Am I right?
And could you please explain slice_header[] bytes? I can’t understand 0x84, 0x21, 0xa0.
Thanks.

May 30th, 2013 | Ben Mesander

Hi, I just wanted to point to another person who’s done some further work with this code, and brought it up to date to deal with the new ffmpeg which more pedantically checks for validity of the various bits:

http://wobblycucumber.blogspot.com/2013/05/i-can-haz-h264-encoder.html

February 27th, 2014 | sukesh

Hi,

I need to Fetching the dimensions of a H264Video stream from H264 header how can
I read data from H264(for eg. length,width etc…..).

Regards,
SUKESH

March 10th, 2014 | Ben Mesander

Hi Sukesh,

I am not sure what you mean by the “H264 header”. If you mean the sequence parameter set embedded in the output of the encoder above, you can find the answer here: https://cardinalpeak.com/blog/the-h-264-sequence-parameter-set/

In general though, the H.264 bitstream is stored within a container (such as a MOV or MP4 file) and in that case, you may use the container to find the dimensions.

Regards,
Ben

April 4th, 2014 | Pradeep

Hi,

When i try to copy 264 file into mp4 format, I get the below error. What could be the problem ?

D:Encoder>ffmpeg.exe -f h264 -i sample1.264 -vcodec copy sample1.mp4
ffmpeg version N-62121-g634636e Copyright (c) 2000-2014 the FFmpeg developers
built on Apr 3 2014 23:30:16 with gcc 4.8.2 (GCC)
configuration: –enable-gpl –enable-version3 –disable-w32threads –enable
ble-libmp3lame –enable-libopencore-amrnb –enable-libopencore-amrwb –enable
bvorbis –enable-libvpx –enable-libwavpack –enable-libx264 –enable-libx265
libavutil 52. 73.100 / 52. 73.100
libavcodec 55. 56.107 / 55. 56.107
libavformat 55. 36.101 / 55. 36.101
libavdevice 55. 11.100 / 55. 11.100
libavfilter 4. 3.100 / 4. 3.100
libswscale 2. 6.100 / 2. 6.100
libswresample 0. 18.100 / 0. 18.100
libpostproc 52. 3.100 / 52. 3.100
sample1.264: No such file or directory

Regards,
Pradeep

April 4th, 2014 | Ben Mesander

Hi Pradeep, it appears the sample1.264 file is not present in the directory in which you are running FFmpeg. Could you check that?

Regards,
Ben

August 12th, 2014 | Jordi Cenzano

Nice work! congrats
It’s a very good way to start learning h264 (starting from the easiest)

[…] World’s smallest H.264 encoder By Ben Mesander […]

June 18th, 2015 | ranchu

Hi,

I have some missunderstanding.
But if the frame is not encoded, how it is that the decoder can play the video ? Doesn’t the decoder try to decode the given buffer (which is actually not in the right format becuase it was not encoded).

Thanks, Ran

[…] start as simple as possible to feel confortable. For example you can implement correct h264 format in 30 lines. Then you can move to making video player on 1000 lines. If you want expand the app with SDL, […]

May 29th, 2016 | Brian Topping

Excellent work. This really gets to the essence of the standard in as pragmatic a manner as possible. I feel empowered to do more interesting things very rapidly as a result. Thank you!!

December 7th, 2016 | Johnny

It would be very interesting to follow this project’s growth going through it’s GIT history.

Is there a repository?

December 16th, 2016 | Leandro Moreira

I think “contains 16×16 luma samples, and two 8×8 blocks of chroma samples” should be two 4×4 blocks https://i.imgur.com/D2GvEPO.png

Post a Reply

Name
Mail
Website
Comments

New Project on the Horizon? Let's Talk