January 24th, 2011 by Mike Perkins

*The use of transforms in data compression algorithms is at least 40 years old. The goal of this three-part series of posts is to provide the mathematical background necessary for understanding transforms and to explain why they are a valuable part of many compression algorithms. *

*I’m focusing on video since that’s my particular interest. Part 1 reviewed vectors, the dot product, and orthonormal bases. Part 2 (this post) will introduce the use of matrices for describing both one and two-dimensional transforms. Finally, Part 3 gives an example and explain intuitively why transforms are a valuable part of video compression algorithms.*

The view of transforms introduced in Part 1 can be efficiently expressed using matrix notation. I will assume that readers are familiar with three concepts from matrix algebra: matrix multiplication, the matrix transpose, and the inverse of a matrix.

Consider an arbitrary set of *n* orthonormal unit vectors. As discussed in Part 1, these vectors define a transform (see eq. 1 in Part 1). Eq. 1 can be compactly written in matrix form as

where ** A** is the matrix that has as its rows the orthonormal basis vectors that define our transform (recall that the superscript T means we write the vector as a row):

We see immediately from the rule of matrix multiplication that

In other words ** y**, the transform of

or equivalently

So the inverse transform is also a matrix multiplication. Furthermore, because the rows of ** A** are a set of orthonormal vectors, the inverse of

To see that ** A^{-1}** is the transpose of

Note the integral role that the dot product plays at every step. Multiplying two matrices together is just computing the dot product between the appropriate row and column. To compute the *i, j*’th component of the product of two matrices, we compute the dot product between the *i*‘th row of the left matrix and the *j*’th column of the right matrix.

Pulling all this together we have the following sequence of equations for the inverse transform:

The last equation in this sequence makes clear that ** x** is reconstructed as a linear combination of the orthonormal basis vectors and that the components of

When dealing with a digital image, it is natural to think of the signal as being inherently two-dimensional. After all, digital images comprise multiple rows of pixels. In other words, the image to be transformed is itself a matrix! Of course, we could concatenate the rows of an image one after another and turn the 2D image signal into a one-dimensional signal, but it is better to confront the 2D nature of image data head on. The transform perspective we outlined above is based on a 1D perspective. How can we extend it to deal with 2D data?

It turns out that a large class of important transforms, including the DFT and the DCT, are *separable*. This means that a two-dimensional data matrix like an image can be 2D transformed by doing a 1D transform on all the columns, storing the transformed columns in an intermediate matrix, and then doing a 1D transform on all the rows of the intermediate matrix. Fortunately, this process can also be easily expressed with matrix operations. Consider the following equation, where ** A** is our 1D transform matrix and

Now ** X** is nothing more than a collection of 1D column vectors, and the matrix multiply operation transforms each column of

Recall that *A*^{T} is just

So we see that this matrix multiply operation transforms each row of *W**.* In other words, the matrix ** Y** is the desired 2D transform and we can write

With respect to the inverse transform, recalling that *A*^{-1} is *A*^{T}, we can left multiply eq. 3 by *A*^{T}, and right multiply the result by ** A** to get the following expression for the 2D inverse transform

It is natural at this point to ask the following question: What plays the role of basis vectors in 2D transforms? The answer is basis matrices. Before deriving an expression for the basis matrices, it is convenient to introduce some new notation. Let the matrix **1*** _{i,j}* be the matrix with a 1 in location

Now we’re ready to rewrite eq. 4 and obtain expressions for the basis matrices. We proceed by first writing ** Y** as a sum of

where the *i, j*’th basis matrix is given by

From eq. 5 it is clear that there are *n*^{2}* *basis matrices and that the *i, j*’th matrix is scaled by *y _{i,j}* in order to reconstruct

In Part 3, I’ll pull all this together with an example, and discuss the discrete cosine transform (DCT), a transform useful for image compression.

*This entire series of blog posts is also available as a Cardinal Peak white paper.*