Transforms for Video Compression, part 2: Matrix Representation and 2D Transforms

Mike Perkins

The use of transforms in data compression algorithms is at least 40 years old. The goal of this three-part series of posts is to provide the mathematical background necessary for understanding transforms and to explain why they are a valuable part of many compression algorithms.

I’m focusing on video since that’s my particular interest. Part 1 reviewed vectors, the dot product, and orthonormal bases. Part 2 (this post) will introduce the use of matrices for describing both one and two-dimensional transforms. Finally, Part 3 gives an example and explain intuitively why transforms are a valuable part of video compression algorithms.

The view of transforms introduced in Part 1 can be efficiently expressed using matrix notation. I will assume that readers are familiar with three concepts from matrix algebra: matrix multiplication, the matrix transpose, and the inverse of a matrix.

Consider an arbitrary set of n orthonormal unit vectors. As discussed in Part 1, these vectors define a transform (see eq. 1 in Part 1). Eq. 1 can be compactly written in matrix form as

Eq. 1m

where A is the matrix that has as its rows the orthonormal basis vectors that define our transform (recall that the superscript T means we write the vector as a row):

We see immediately from the rule of matrix multiplication that

In other words y, the transform of x, is obtained by a matrix multiplication. The matrix multiplication gives us the projections of x onto the basis vectors. How about the inverse transform? With respect to eq. 1m we can left-multiply both sides by the inverse of the matrix A to obtain

or equivalently

So the inverse transform is also a matrix multiplication. Furthermore, because the rows of A are a set of orthonormal vectors, the inverse of A is just AT! (Note that I’m limiting our discussion to vectors and matrices with real-valued components; however, the ideas presented here are easy to generalize to complex numbers. Complex valued matrices and vectors are needed for the DFT, but the most popular transforms for image compression, including the DCT, can all be computed with real arithmetic.)

To see that A-1 is the transpose of A, ponder the following matrix equation until the orthonormality of the u vectors makes it clear that the equation is true:

Note the integral role that the dot product plays at every step. Multiplying two matrices together is just computing the dot product between the appropriate row and column. To compute the i, j’th component of the product of two matrices, we compute the dot product between the i‘th row of the left matrix and the j’th column of the right matrix.

Pulling all this together we have the following sequence of equations for the inverse transform:

Eq. 2m

The last equation in this sequence makes clear that x is reconstructed as a linear combination of the orthonormal basis vectors and that the components of y provide the appropriate weights.

When dealing with a digital image, it is natural to think of the signal as being inherently two-dimensional. After all, digital images comprise multiple rows of pixels. In other words, the image to be transformed is itself a matrix! Of course, we could concatenate the rows of an image one after another and turn the 2D image signal into a one-dimensional signal, but it is better to confront the 2D nature of image data head on. The transform perspective we outlined above is based on a 1D perspective. How can we extend it to deal with 2D data?

It turns out that a large class of important transforms, including the DFT and the DCT, are separable. This means that a two-dimensional data matrix like an image can be 2D transformed by doing a 1D transform on all the columns, storing the transformed columns in an intermediate matrix, and then doing a 1D transform on all the rows of the intermediate matrix. Fortunately, this process can also be easily expressed with matrix operations. Consider the following equation, where A is our 1D transform matrix and X is a 2D data matrix

Now X is nothing more than a collection of 1D column vectors, and the matrix multiply operation transforms each column of X using the matrix A. So the matrix W is the intermediate matrix mentioned above—the matrix obtained by transforming all the columns of X. Now consider the following:

Recall that AT is just

So we see that this matrix multiply operation transforms each row of W. In other words, the matrix Y is the desired 2D transform and we can write

Eq. 3

With respect to the inverse transform, recalling that A-1 is AT, we can left multiply eq. 3 by AT, and right multiply the result by A to get the following expression for the 2D inverse transform

Eq. 4

It is natural at this point to ask the following question: What plays the role of basis vectors in 2D transforms? The answer is basis matrices. Before deriving an expression for the basis matrices, it is convenient to introduce some new notation. Let the matrix 1i,j be the matrix with a 1 in location i,j and a 0 in all other locations (here i indexes the row and j indexes the column). Let yi,j denote the component of Y in row i and column j. Finally, we define the “product” of two vectors a and b as follows (note that this vector product is nothing more than standard matrix multiplication of an n-by-1 matrix with a 1-by-n matrix):

Now we’re ready to rewrite eq. 4 and obtain expressions for the basis matrices. We proceed by first writing Y as a sum of n2 appropriately weighted 1i,j matrices (i and j each vary from 0 to n-1 as we step through every row and column in the Y matrix). Then we distribute the matrix multiplications and simplify. Mathematically, we have the following:

Eq. 5

where the i, j’th basis matrix is given by

From eq. 5 it is clear that there are n2 basis matrices and that the i, j’th matrix is scaled by yi,j in order to reconstruct X. Furthermore, the basis matrices have a certain intuitive appeal. They are constructed by taking every possible product of the basis vectors.

In Part 3, I’ll pull all this together with an example, and discuss the discrete cosine transform (DCT), a transform useful for image compression.

This entire series of blog posts is also available as a Cardinal Peak white paper.

Cardinal Peak
Learn more about our Audio & Video capabilities.

Dive deeper into our IoT portfolio

Take a look at the clients we have helped.

We’re always looking for top talent, check out our current openings. 

Contact Us

Please fill out the contact form below and our engineering services team will be in touch soon.

We rely on Cardinal Peak for their ability to bolster our patent licensing efforts with in-depth technical guidance. They have deep expertise and they’re easy to work with.
Diego deGarrido Sr. Manager, LSI
Cardinal Peak has a strong technology portfolio that has complemented our own expertise well. They are communicative, drive toward results quickly, and understand the appropriate level of documentation it takes to effectively convey their work. In…
Jason Damori Director of Engineering, Biamp Systems
We asked Cardinal Peak to take ownership for an important subsystem, and they completed a very high quality deliverable on time.
Matt Cowan Chief Scientific Officer, RealD
Cardinal Peak’s personnel worked side-by-side with our own engineers and engineers from other companies on several of our key projects. The Cardinal Peak staff has consistently provided a level of professionalism and technical expertise that we…
Sherisse Hawkins VP Software Development, Time Warner Cable
Cardinal Peak was a natural choice for us. They were able to develop a high-quality product, based in part on open source, and in part on intellectual property they had already developed, all for a very effective price.
Bruce Webber VP Engineering, VBrick
We completely trust Cardinal Peak to advise us on technology strategy, as well as to implement it. They are a dependable partner that ultimately makes us more competitive in the marketplace.
Brian Brown President and CEO, Decatur Electronics
The Cardinal Peak team started quickly and delivered high-quality results, and they worked really well with our own engineering team.
Charles Corbalis VP Engineering, RGB Networks
We found Cardinal Peak’s team to be very knowledgeable about embedded video delivery systems. Their ability to deliver working solutions on time—combined with excellent project management skills—helped bring success not only to the product…
Ralph Schmitt VP, Product Marketing and Engineering, Kustom Signals
Cardinal Peak has provided deep technical insights, and they’ve allowed us to complete some really hard projects quickly. We are big fans of their team.
Scott Garlington VP Engineering, xG Technology
We’ve used Cardinal Peak on several projects. They have a very capable engineering team. They’re a great resource.
Greg Read Senior Program Manager, Symmetricom
Cardinal Peak has proven to be a trusted and flexible partner who has helped Harmonic to deliver reliably on our commitments to our own customers. The team at Cardinal Peak was responsive to our needs and delivered high quality results.
Alex Derecho VP Professional Services, Harmonic
Yonder Music was an excellent collaboration with Cardinal Peak. Combining our experience with the music industry and target music market, with Cardinal Peak’s technical expertise, the product has made the mobile experience of Yonder as powerful as…
Adam Kidron founder and CEO, Yonder Music
The Cardinal Peak team played an invaluable role in helping us get our first Internet of Things product to market quickly. They were up to speed in no time and provided all of the technical expertise we lacked. They interfaced seamlessly with our i…
Kevin Leadford Vice President of Innovation, Acuity Brands Lighting
We asked Cardinal Peak to help us address a number of open items related to programming our systems in production. Their engineers have a wealth of experience in IoT and embedded fields, and they helped us quickly and diligently. I’d definitely…
Ryan Margoles Founder and CTO, notion