Transforms for Video Compression, part 3: The DCT and Why Transforming is Valuable

Mike Perkins

The use of transforms in data compression algorithms is at least 40 years old. The goal of this three-part series of posts is to provide the mathematical background necessary for understanding transforms and to explain why they are a valuable part of many compression algorithms.

I’m focusing on video since that’s my particular interest. Part 1 reviewed vectors, the dot product, and orthonormal bases. Part 2 introduced the use of matrices for describing both one and two-dimensional transforms. Finally, Part 3 (this post) gives an example and explains intuitively why transforms are a valuable part of video compression algorithms.

Now that we have stepped through some of the rationale behind simple transforms, and contemplated how transforms can be applied to two-dimensional input matrices, it is time to discuss the basis for almost all media compression—the discrete cosine transform, or DCT.

The basis vectors for the size N DCT transform are given by

where j indexes the vector column and

Of course both k and j vary from 0 to N-1 (recall that k indexes the transform matrix row). The following matrix therefore defines the size 4 DCT transform:

If you want to practice your matrix math, compute AAT for this matrix and verify that you get the identity matrix I.

If you study the formula for the DCT basis vectors, you’ll see that they are sinusoids—as k increases, the frequency increases. All the values in row zero of the transform matrix are constant and each subsequent row has a higher frequency than the one above it. You will also see that each sinusoid has a phase shift that depends on k.

What do the basis matrices look like? Recall from eq. 5 of the last post that the i,j’th basis vector is given by the vector product

A few representative basis matrices are shown below to provide a flavor for their structure.

In general, the basis matrices exhibit sinusoidal behavior on their rows, on their columns, or on both. This makes sense when you consider that each is the product of two sinusoidal vectors! The picture below shows all 16 of the basis matrices in pictorial form. In this picture, values that have a reddish hue are negative, while grayscale values are positive. The “whiter” a grayscale value the larger the positive number it represents, while the brighter red a value, the smaller the negative number it represents.

So why is it useful to transform blocks of data from an image as part of a compression algorithm? First, any image can be partitioned into non-overlapping blocks of size N×N pixels (in our case 4-by-4). Each block comprises 16 numbers, so we need to communicate an approximation of all 16 of these numbers to a receiver in order to reconstruct an approximation of the block. Without compression, we would just send each of the 16 numbers separately. Can we get by with sending fewer numbers?

Consider the special case where a block to be compressed is exactly equal to a scaled version of one of our basis matrices. In this case, in order to communicate all 16 image numbers to a receiver, we only need to send the position of the retained basis matrix and the scale factor to apply to it. In the general case, if a 4×4 image block can be closely approximated by a linear combination of just a few of the basis matrices, then it is desirable to transform the block first and just send the scale factors and positions of the appropriate basis matrices. (Note that worst case the positions of the retained basis matrices can be sent using 16 bits, one bit for each position. In practice, there are even more efficient ways to do it.)

For a specific example assume the following 4-by-4 block is a group of 16 pixels from an 8‑bit black and white image. Note that the block corresponds to a darker region in the lower left-hand corner and a brighter region in the upper right-hand corner:

The transform of this block is shown below:

Now, what do we get if we keep only the five largest values in Y and inverse transform them? In other words, what do we get if we inverse transform the matrix below?

Computing the inverse transform with just these five components is equivalent to approximating the original data by a linear combination of only five of the sixteen basis matrices. We get the following result:

Although many of the individual pixel values in this approximation are no longer exactly the original value, the basic visual impression is preserved: The lower left-hand portion of the approximation is darker than the upper right-hand portion.

A good transform for a particular signal packs most of the signal’s energy into the upper left-hand side of Y. The best transform for a given signal depends on the signal’s statistics. It turns out that the DCT is a particularly good transform for the statistics of images—and there is a computationally fast algorithm for its calculation, which is important in real-world applications.

Now, imagine that you can adaptively decide which components of Y to keep on a block-by-block basis … and that the choice of which coefficients you kept can be efficiently communicated to a receiver … and that the quantizer step size you used on each retained Y component can vary depending on its position. At that point, you are well on your way to understanding MPEG-2, H.264, and similar transform-based algorithms!

This entire series of blog posts is also available as a Cardinal Peak white paper.

Cardinal Peak
Learn more about our Audio & Video capabilities.

Dive deeper into our IoT portfolio

Take a look at the clients we have helped.

We’re always looking for top talent, check out our current openings. 

Contact Us

Please fill out the contact form below and our engineering services team will be in touch soon.

We rely on Cardinal Peak for their ability to bolster our patent licensing efforts with in-depth technical guidance. They have deep expertise and they’re easy to work with.
Diego deGarrido Sr. Manager, LSI
Cardinal Peak has a strong technology portfolio that has complemented our own expertise well. They are communicative, drive toward results quickly, and understand the appropriate level of documentation it takes to effectively convey their work. In…
Jason Damori Director of Engineering, Biamp Systems
We asked Cardinal Peak to take ownership for an important subsystem, and they completed a very high quality deliverable on time.
Matt Cowan Chief Scientific Officer, RealD
Cardinal Peak’s personnel worked side-by-side with our own engineers and engineers from other companies on several of our key projects. The Cardinal Peak staff has consistently provided a level of professionalism and technical expertise that we…
Sherisse Hawkins VP Software Development, Time Warner Cable
Cardinal Peak was a natural choice for us. They were able to develop a high-quality product, based in part on open source, and in part on intellectual property they had already developed, all for a very effective price.
Bruce Webber VP Engineering, VBrick
We completely trust Cardinal Peak to advise us on technology strategy, as well as to implement it. They are a dependable partner that ultimately makes us more competitive in the marketplace.
Brian Brown President and CEO, Decatur Electronics
The Cardinal Peak team started quickly and delivered high-quality results, and they worked really well with our own engineering team.
Charles Corbalis VP Engineering, RGB Networks
We found Cardinal Peak’s team to be very knowledgeable about embedded video delivery systems. Their ability to deliver working solutions on time—combined with excellent project management skills—helped bring success not only to the product…
Ralph Schmitt VP, Product Marketing and Engineering, Kustom Signals
Cardinal Peak has provided deep technical insights, and they’ve allowed us to complete some really hard projects quickly. We are big fans of their team.
Scott Garlington VP Engineering, xG Technology
We’ve used Cardinal Peak on several projects. They have a very capable engineering team. They’re a great resource.
Greg Read Senior Program Manager, Symmetricom
Cardinal Peak has proven to be a trusted and flexible partner who has helped Harmonic to deliver reliably on our commitments to our own customers. The team at Cardinal Peak was responsive to our needs and delivered high quality results.
Alex Derecho VP Professional Services, Harmonic
Yonder Music was an excellent collaboration with Cardinal Peak. Combining our experience with the music industry and target music market, with Cardinal Peak’s technical expertise, the product has made the mobile experience of Yonder as powerful as…
Adam Kidron founder and CEO, Yonder Music
The Cardinal Peak team played an invaluable role in helping us get our first Internet of Things product to market quickly. They were up to speed in no time and provided all of the technical expertise we lacked. They interfaced seamlessly with our i…
Kevin Leadford Vice President of Innovation, Acuity Brands Lighting
We asked Cardinal Peak to help us address a number of open items related to programming our systems in production. Their engineers have a wealth of experience in IoT and embedded fields, and they helped us quickly and diligently. I’d definitely…
Ryan Margoles Founder and CTO, notion