How Robust is Audio Perception in the Face of Deliberate Magnitude and Phase Distortions? (Part 3)

Mike Perkins

In the first post of this three-part series, I listed four points that I hope my readers will agree with at the end of this series. The second post addressed the first two points of the four. In this post, Part Three of the series, I will demonstrate the final two points:

  1. Phase distortions generally have less effect on human perception than magnitude distortions; and
  2. Two audio clips can be recognized by humans as matching despite having dramatically different spectrograms.

 

In Part One I explained the concept of a spectrogram and how it is computed using the DFT. In Part Two we looked at the effect of distortions on human aural perception. We found that in some cases phase distortions change the time domain waveform but have no effect on our perception. In other cases, phase distortions clearly affect the audio, but the distortions have no impact on our ability to easily recognize a clip. In this final part we will look at the effect of spectral magnitude distortions. Unlike the case with phase distortions, magnitude distortions change the spectrogram.

Let’s begin with the clip x03.wav first introduced in Part One. Recall that it is a sum of five sinusoids, with frequencies of the five sinusoids, in Hz, are {500 Hz, 1000 Hz, 1500 Hz, 2000 Hz, 2500 Hz}. The magnitudes of the five sinusoids are {1000, 2000, 750, 1000, 1500}. The waveform x03.wav was formed from the following sum:

where the sampling rate is 48 KHz (T=1/48,000) and the five phase values f are {0, 0, 0, 0, 0}. What happens if we change the magnitude values to the five randomly chosen values {427, 716, 2113, 1382, 373}? The result is the file x05.wav. Waveforms for both x03.wav and x05.wav are shown below:

x03 waveform

x05 waveform

The spectrograms of these two waveforms appear below (x03.wav is on top). The frequencies are the same, although the 2500 Hz sine wave is barely visible in x05.wav because its magnitude is so small. It is clear that the intensities for each frequency in x05.wav are different from those in x03.wav; in other words, the spectrogram has changed.

If you listen to these two clips you’ll hear that it is easy to tell them apart. Note how different this is from the case examined in Part Two where a random change of the sinusoids’ phases led to a waveform that sounded the same, although its time domain graph showed it was different. This is an illustration of the fact that the ear is generally less sensitive to phase distortions than to magnitude distortions. Furthermore, if you compare x05.wav’s waveform above to that of x04.wav in Part Two I think you’ll agree that it is hard to look at two time domain waveforms and know in advance whether they will sound the same, or different, as a reference waveform (in this case, x03.wav).

In Part Two we did the experiment of keeping the magnitudes for the Spock clip the same, while assigning random phases to the DFT coefficients. This does not change the spectrogram. What if we assign random magnitudes to the DFT coefficients while keeping the phases the same? This DOES change the spectrogram. Below are pictures of the spectrograms in these two cases; the corresponding audio files can be found in spock_ran_mag.wav and spock_m.wav.

spock_ran_mag spectrogram

spock_m spectrogram

If you listen to these clips you’ll see that it is impossible to recognize the magnitude-distorted clip as Spock. This too illustrates that the ear is more sensitive to magnitude distortions than phase distortions.

Now we want to consider a case where the Spock spectrogram is dramatically different, yet the clip is still easily recognizable. Consider the following processing: for each row of the spectrogram, find the maximum magnitude of a DFT coefficient. Then, for all coefficients in that row, if the magnitude of a DFT coefficient is within 40dB of the maximum, set its magnitude equal to that of the maximum (keep the phase the same), otherwise set its magnitude to zero. This processing results in a spectrogram with only two levels: on or off. A given frequency is either present or not. The processing throws away coefficients that aren’t carrying much energy (they are more than 40dB down from the maximum). The spectrogram looks like this:

spock_2level spectrogram

I think you’ll agree that this is substantially different than the spectrograph of the original clip. Nevertheless, you will have no trouble identifying the clip if you listen to it: spock_2level.wav. This example demonstrates the final point I set out to show: that two clips can be recognized as the same even if they have dramatically different spectrographs. As an even more amazing demonstration of this, what if we set every DFT coefficient to the same magnitude while preserving its phase. The spectrogram is now solid white–every frequency is present with the same magnitude throughout the entire clip. Amazingly, the clip can still be identified (especially when Spock is speaking): spock_cnst.wav. This demonstrates that although the magnitude may be more important for aural perception than phase, the phase alone carries enough information, albeit just barely, to recognize a clip.

 

Categories: Audio, Perk, Theory

Cardinal Peak
Learn more about our Audio & Video capabilities.

Dive deeper into our IoT portfolio

Take a look at the clients we have helped.

We’re always looking for top talent, check out our current openings. 

Contact Us

Please fill out the contact form below and our engineering services team will be in touch soon.

We rely on Cardinal Peak for their ability to bolster our patent licensing efforts with in-depth technical guidance. They have deep expertise and they’re easy to work with.
Diego deGarrido Sr. Manager, LSI
Cardinal Peak has a strong technology portfolio that has complemented our own expertise well. They are communicative, drive toward results quickly, and understand the appropriate level of documentation it takes to effectively convey their work. In…
Jason Damori Director of Engineering, Biamp Systems
We asked Cardinal Peak to take ownership for an important subsystem, and they completed a very high quality deliverable on time.
Matt Cowan Chief Scientific Officer, RealD
Cardinal Peak’s personnel worked side-by-side with our own engineers and engineers from other companies on several of our key projects. The Cardinal Peak staff has consistently provided a level of professionalism and technical expertise that we…
Sherisse Hawkins VP Software Development, Time Warner Cable
Cardinal Peak was a natural choice for us. They were able to develop a high-quality product, based in part on open source, and in part on intellectual property they had already developed, all for a very effective price.
Bruce Webber VP Engineering, VBrick
We completely trust Cardinal Peak to advise us on technology strategy, as well as to implement it. They are a dependable partner that ultimately makes us more competitive in the marketplace.
Brian Brown President and CEO, Decatur Electronics
The Cardinal Peak team started quickly and delivered high-quality results, and they worked really well with our own engineering team.
Charles Corbalis VP Engineering, RGB Networks
We found Cardinal Peak’s team to be very knowledgeable about embedded video delivery systems. Their ability to deliver working solutions on time—combined with excellent project management skills—helped bring success not only to the product…
Ralph Schmitt VP, Product Marketing and Engineering, Kustom Signals
Cardinal Peak has provided deep technical insights, and they’ve allowed us to complete some really hard projects quickly. We are big fans of their team.
Scott Garlington VP Engineering, xG Technology
We’ve used Cardinal Peak on several projects. They have a very capable engineering team. They’re a great resource.
Greg Read Senior Program Manager, Symmetricom
Cardinal Peak has proven to be a trusted and flexible partner who has helped Harmonic to deliver reliably on our commitments to our own customers. The team at Cardinal Peak was responsive to our needs and delivered high quality results.
Alex Derecho VP Professional Services, Harmonic
Yonder Music was an excellent collaboration with Cardinal Peak. Combining our experience with the music industry and target music market, with Cardinal Peak’s technical expertise, the product has made the mobile experience of Yonder as powerful as…
Adam Kidron founder and CEO, Yonder Music
The Cardinal Peak team played an invaluable role in helping us get our first Internet of Things product to market quickly. They were up to speed in no time and provided all of the technical expertise we lacked. They interfaced seamlessly with our i…
Kevin Leadford Vice President of Innovation, Acuity Brands Lighting
We asked Cardinal Peak to help us address a number of open items related to programming our systems in production. Their engineers have a wealth of experience in IoT and embedded fields, and they helped us quickly and diligently. I’d definitely…
Ryan Margoles Founder and CTO, notion