Menu

January 13th, 2011 by Mike Perkins

*The use of transforms in data compression algorithms is at least 40 years old. The goal of this three-part series of posts is to provide the mathematical background necessary for understanding transforms and to explain why they are a valuable part of many compression algorithms. *

*I’ll focus on video since that’s my particular interest. Part 1 (this post) will review vectors, the dot product, and orthonormal bases. Part 2 introduces the use of matrices for describing both one and two-dimensional transforms. Finally, Part 3 gives an example and explains intuitively why transforms are a valuable part of video compression algorithms.*

Remember back in high school when you learned the “dot product” between two vectors? It turns out that this operation is at the heart of a powerful way of thinking about transforms such as the DFT, the DCT, and the integer transform used in H.264. In this post, part 1 of a three part series, I review some key concepts we’ll need including vectors, the dot product, and orthonormal bases. So let’s get started!

An *n* dimensional vector is just a collection of *n* numbers. By convention the numbers are usually written in a column, like this:

Vector concepts are valuable in signal processing because a discrete, or digital, signal is nothing more than a collection of numbers. In other words, a discrete signal (of finite length) is a vector. The numbers in a vector can be samples of a music waveform, pixels in an image, or samples from any signal that tickles your fancy. We call the numbers that make up a vector the *components* of the vector. Any operation that can be performed on a vector can be thought of as an operation performed on a discrete signal.

Now, an *n* dimensional vector can be viewed in two ways:

- as a point in
*n*-dimensional space (i.e. the components of the vector are the coordinates of a point); or - as a directed line segment with its tail at the origin and its head at the point specified by its coordinates (this is the way we normally think of a vector—as an “arrow” with a definite direction).

The Pythagorean Theorem tells us how long a vector is as a function of its components:

If you have two vectors, then the *dot product* between them is computed by multiplying their corresponding components together and summing up the products. Mathematically we express this as

We see immediately that

It can also be shown that

This is just the vector formulation of the famous law of cosines you learned in trigonometry class, where θ is the angle between the vectors ** a** and

If a vector has length one, we call it a *unit* vector. We can always form a unit vector that points in the same direction as an arbitrary vector ** b** by applying the formula

(Recall that multiplying or dividing a vector by a scalar means multiplying or dividing each component of the vector by the scalar.)

Finally, we can find the projection of a vector ** a** in the direction of a unit vector

Let’s consider an example for the case of three-dimensional vectors. We note that (1, 0, 0), (0, 1, 0), and (0, 0, 1) are all unit vectors; furthermore, they are all orthogonal to each other because the dot product between any pair of them is zero. Of course (1, 0, 0) points in the direction of the *x*‑axis, (0, 1, 0) points in the direction of the *y*‑axis, and (0, 0, 1) points in the direction of the *z*‑axis. Because these three unit vectors are all orthogonal to each other we say these vectors are *orthonormal* to each other (here “normal” refers to the fact that the vectors have a length of 1—they’ve been normalized).

Now, let *x*^{T} = (*x*_{0}, *x*_{1}, *x*_{2}) be an arbitrary vector in 3D space. (The superscript “T” means transpose; when we write a vector horizontally we refer to it as the transpose of the vertical vector. But I may be sloppy at times and let the context make clear whether a vector is horizontally or vertically oriented.) The dot product of ** x **with (1, 0, 0) is

Furthermore, the unit vectors (1, 0, 0), (0, 1, 0), and (0, 0, 1) form an *orthonormal basis* for the set of all 3D vectors because any 3D vector can be written as a scaled sum of these three vectors. In particular:

Now we’re ready for a key insight into transforms. *The vectors (1, 0, 0), (0, 1, 0), and (0, 0, 1) are* *not the only orthonormal basis vectors*. For example, one easy way to get another set of orthonormal vectors is to rotate the unit vectors that point in the *x*-axis and *y*‑axis directions by 45 degrees, while leaving the (0, 0, 1) vector unchanged. Doing this we get the three unit vectors (0.707, 0.707, 0), (-0.707, 0.707, 0), and (0, 0, 1). It is easy to compute the dot product of every pair of these vectors and show that they are mutually orthogonal; it is also easy to see that all the vectors have a length of 1. What if we take the projection of (*x*_{0}, *x*_{1}, *x*_{2}) onto each of these new vectors? We get

Now we can express the vector *x*^{T} as a sum of these three new unit vectors as follows:

In other words, given the three projections *y*_{0}, *y*_{1}, and *y*_{2}, we can reconstruct the numbers *x*_{0}, *x*_{1}, and *x*_{2} from them by using them to scale their corresponding unit vectors and then adding up the scaled vectors.

In fact, for *any* set of three orthonormal vectors, we can compute the projection of ** x** onto each of the vectors, and given these projections, we can reconstruct

When the goal is to compress the vector (i.e., the discrete signal) ** x**, then it may be easier to first transform

*This entire series of blog posts is also available as a Cardinal Peak white paper.*