How Robust is Audio Perception in the Face of Deliberate Magnitude and Phase Distortions? (Part 2)

Mike Perkins

In the first post of this three-part series, I listed four points that I hope my readers will agree with at the end of this series. In this post, Part Two of the series, I will demonstrate the first two of those four points:

  1. Dramatically different time domain waveforms can lead to virtually the same audio perception; and
  2. Two waveforms with identical spectrograms can sound quite different.

 

In Part One, I summarized how a vector of length N real-valued audio samples is transformed by the DFT into an equal-length vector complex transform coefficients. The transform coefficients give us the magnitudes and phases of the sinusoids composing the vector of audio samples, so we sometimes refer to the transform coefficients as the spectrum of the audio samples. I will also use the term time domain when discussing the raw audio samples, and the term frequency domain when referring to the transformed coefficients (the spectrum).

Now, if we deliberately change the magnitudes of the transform coefficients, we introduce a magnitude distortion. When the distorted transform coefficients are used to reconstruct the time-domain audio samples, they will no longer be the same as the original audio samples. On the other hand, if we deliberately change the phases of the transform coefficients, we introduce a phase distortion. Both of these distortions are spectral distortions because they change the spectrum of the audio samples. Because there is a one-to-one relationship between a vector of audio samples and its spectrum, any change to the spectrum will cause a distortion in the reconstructed time domain samples.

Imagine processing a digitized audio clip in the following manner:

  1. Break the clip into non-overlapping blocks of N samples each
  2. Apply a Discrete Fourier Transform to each length N block
    1. Generate a spectrogram from the DFTs
  3. Spectrally distort the coefficients of each block in some manner
    1. Generate a spectrogram from the distorted DFTs
  4. Synthesize N audio samples from the distorted transform coefficients, by performing an inverse DFT
  5. Compare the original time-domain samples to the distorted samples. We’ll both graph them and listen to them.
    1. Compare the spectrogram of the original samples to the spectrogram of the distorted samples.

Let’s begin with the clip x03.wav introduced in the first post of this series. It is a sum of five sinusoids, with frequencies of {500 Hz, 1000 Hz, 1500 Hz, 2000 Hz, 2500 Hz}. The magnitudes of the five sinusoids are {1000, 2000, 750, 1000, 1500}. The waveform x03.wav was formed from the following sum:

where the sampling rate is 48 KHz (T=1/48,000) and the five phase values f are {0, 0, 0, 0, 0}. What happens if we change the phase values, in degrees, to the five randomly chosen values {0, -48, 67, 33, -62}? The result is the waveform x04.wav. Note that the spectrographs of x03.wav and x04.wav are identical because only the phases are being distorted. Both spectrographs look like this:

When I listen to these two waveforms, I cannot tell them apart. Nevertheless, it is easy to see that they are different in the time domain. Snippets from each waveform are shown below:

x03 waveform

x04 waveform

These graphs demonstrate the first point I wanted to make: dramatically different time domain waveforms can lead to the same audio perception. Perhaps this is really not so surprising—after all, files compressed with the MP3 and AAC algorithms are commonplace. Abstractly, these algorithms can be viewed as techniques for mapping M bits onto N bits where N < M. For algorithms such as these that achieve significant compression, N is much less than M, and the mapping distorts the waveform (we therefore say these algorithms are lossy, not lossless). Most of the time we cannot hear the difference between the original and compressed waveforms. Nevertheless, I think it is humbling and important to keep in mind that the ear can be easily fooled into thinking that two distinctly different time-domain waveforms are “identical” when in fact they are not.

How about the case where phase distortions are applied to real music as opposed to the synthetically-generated periodic waveform above? Consider the spock_m.wav file introduced in post 1. What happens if we set the phase of every transform coefficient to zero? It sounds like this: spock_phase0.wav. A graph of the same spot in the two waveforms is shown below (spock_m.wav is in red):

Spock_m in green, Spock_phase0 in red

In this case there is no denying that you can hear the difference between the waveforms, even though they have identical spectrograms! Recall that this was the second point I set out to make. (In terms of simply recognizing what you are hearing, however, I’m sure you had no difficulty in identifying spock_phase0.wav as the Spock clip, even though it sounds different than spock_m.wav.)

How about if we randomly change the phase of every coefficient in every DFT block by using a random number generator to generate a phase value between -π and π for each coefficient? Doing so we obtain spock_phase_ran.wav. This clip is surprisingly easy to recognize, even if Spock does sound like he’s suffering from some weird space sickness. The original and distorted time domain waveforms are shown below for the same spot as graphed above.

Spock_m in green, Spock_phase_ran in red

Finally, just in case you are a Sherlock Holmes fan, here are the corresponding two waveforms for that theme song: holmes_phase0.wav, holmes_phase_ran.wav.

 

Categories: Audio, Perk, Theory

Cardinal Peak
Learn more about our Audio & Video capabilities.

Dive deeper into our IoT portfolio

Take a look at the clients we have helped.

We’re always looking for top talent, check out our current openings. 

Contact Us

Please fill out the contact form below and our engineering services team will be in touch soon.

We rely on Cardinal Peak for their ability to bolster our patent licensing efforts with in-depth technical guidance. They have deep expertise and they’re easy to work with.
Diego deGarrido Sr. Manager, LSI
Cardinal Peak has a strong technology portfolio that has complemented our own expertise well. They are communicative, drive toward results quickly, and understand the appropriate level of documentation it takes to effectively convey their work. In…
Jason Damori Director of Engineering, Biamp Systems
We asked Cardinal Peak to take ownership for an important subsystem, and they completed a very high quality deliverable on time.
Matt Cowan Chief Scientific Officer, RealD
Cardinal Peak’s personnel worked side-by-side with our own engineers and engineers from other companies on several of our key projects. The Cardinal Peak staff has consistently provided a level of professionalism and technical expertise that we…
Sherisse Hawkins VP Software Development, Time Warner Cable
Cardinal Peak was a natural choice for us. They were able to develop a high-quality product, based in part on open source, and in part on intellectual property they had already developed, all for a very effective price.
Bruce Webber VP Engineering, VBrick
We completely trust Cardinal Peak to advise us on technology strategy, as well as to implement it. They are a dependable partner that ultimately makes us more competitive in the marketplace.
Brian Brown President and CEO, Decatur Electronics
The Cardinal Peak team started quickly and delivered high-quality results, and they worked really well with our own engineering team.
Charles Corbalis VP Engineering, RGB Networks
We found Cardinal Peak’s team to be very knowledgeable about embedded video delivery systems. Their ability to deliver working solutions on time—combined with excellent project management skills—helped bring success not only to the product…
Ralph Schmitt VP, Product Marketing and Engineering, Kustom Signals
Cardinal Peak has provided deep technical insights, and they’ve allowed us to complete some really hard projects quickly. We are big fans of their team.
Scott Garlington VP Engineering, xG Technology
We’ve used Cardinal Peak on several projects. They have a very capable engineering team. They’re a great resource.
Greg Read Senior Program Manager, Symmetricom
Cardinal Peak has proven to be a trusted and flexible partner who has helped Harmonic to deliver reliably on our commitments to our own customers. The team at Cardinal Peak was responsive to our needs and delivered high quality results.
Alex Derecho VP Professional Services, Harmonic
Yonder Music was an excellent collaboration with Cardinal Peak. Combining our experience with the music industry and target music market, with Cardinal Peak’s technical expertise, the product has made the mobile experience of Yonder as powerful as…
Adam Kidron founder and CEO, Yonder Music
The Cardinal Peak team played an invaluable role in helping us get our first Internet of Things product to market quickly. They were up to speed in no time and provided all of the technical expertise we lacked. They interfaced seamlessly with our i…
Kevin Leadford Vice President of Innovation, Acuity Brands Lighting
We asked Cardinal Peak to help us address a number of open items related to programming our systems in production. Their engineers have a wealth of experience in IoT and embedded fields, and they helped us quickly and diligently. I’d definitely…
Ryan Margoles Founder and CTO, notion