Moulton Laboratories
the art and science of sound
24 Bits: Can You Hear ‘Em? 96 kHz.: Can You Hear It?
Dave Moulton
October 1999

How Do You Know? What Does It All Mean?

24 Bits: Can You Hear ‘Em? 96 kHz.: Can You Hear It?

I do a fair amount of subjective testing in the audio field, in addition to my audio production and studio design work. Over the past six months, I’ve gotten caught up in a food fight on the Internet about the audibility of various audio processes.

That food fight really got me to thinking about the trouble we all have communicating about these issues. There are two central parts to the problem. First, we aren’t very objective about describing our subjective impressions and, second, we mostly fail to consider what is generally meant by the terms that we use. Such foibles are perfectly normal for us audio engineer types, and because we aren’t scientists by profession we can’t be held responsible for not making the kind of careful objective claims that real science requires.

Nevertheless, it is important for us to be able to understand at least a little bit about what is really going on in regard to the concept of audibility, just to get along in our professional audio lives. At the same time, we hear a lot of talk about the need for expanded resolution in digital audio these days, which raises some important questions. Can we really hear the difference? And, how important is it?

The answers we get to these questions when we carefully read the trade journals and the reported findings about audibility are a little confusing and contradictory. People have described the difference between 16-bit and 20-bit audio as the difference between mediocre and awesome. Others have ascribed a remarkable transparency to 20-bit signals. Similarly, individuals have reported increased clarity, definition, detailing and other virtues to audio signals with sample rates significantly higher than 48 kHz. Meanwhile, others note that such observations do not get made when we use double-blind tests, nor are they supported by such blind tests, presumed to be more rigorous and “objective” than more informal studies. In fact, double blind tests often seem to show that listeners can’t reliably distinguish between 16-bit audio and 20-bit audio.

How can this be? How can people clearly hear an effect, to a point where they choose to describe it with a superlative such as “amazing,” when under controlled blind conditions other people (and sometimes they themselves) can’t distinguish it from the same signal without said “amazing” effect? One cynical answer is that they are simply making it up. This is called the “Emperor’s New Clothes” syndrome. Humans are suggestible, and it is pretty easy for us to get stampeded into a group-think state of mind where we will clearly hear anything we think will be socially acceptable, whether it’s real or not. All of us working in audio have had the experience of equalizing a channel to a point where everybody in the control room agrees we’ve made some really significant, perhaps awesome, improvements, only to discover that we were equalizing an adjacent channel that wasn’t even switched on!

But I’m not satisfied with “The Emperor’s New Stereo” answer. I’m sure that suggestibility is often an issue, but we know very well that we are susceptible to making these mistakes and we’re generally pretty careful about avoiding them. Further, too many people I know whose hearing acuity I really respect have reported hearing things like a “BIIIIG” difference between 16-bit and 20-bit audio for me to say, “Nah, that’s just group-think. They’re making it up.”

One of my goals, as a researcher, is to find out how to resolve these differences, how to reconcile apparently contradictory and mutually exclusive reports about the audible effect of some audio process. I’d like to be able to explain how it can be that these differences exist, rather than assuming and trying to prove that one of the viewpoints is simply “wrong.”

There are several things to think about.
  • First, we need to consider the actual nature of blind tests.
  • Second we need to think about what we really mean when we describe an improvement as “amazing.”
  • Third, we need to know how the term “audible” is really defined.
  • Fourth, we need to know something about the nature of human perception.
Over the next several months, we’ll look into these questions in detail. There’s actually quite a bit to be said about each of the four questions I’ve raised. But first, we need to consider why this stuff is so important.

We are involved in a business of creating illusions for our customers. We spend a great deal of time and money on these illusions to make them effective. Our viewers/listeners similarly spend a great deal of money to experience illusions they find compelling, convincing and satisfying. At the same time, we’ve bought into the notion that the illusions become better as they become “more accurate,” and so we keep trying to improve the resolution of our illusions.

Meanwhile, it’s a competitive world out there. He/she who creates the most powerful illusion for the least bucks wins. We need to not waste our time and resources on effects that don't matter. We need to know what matters, how much it matters and how much it costs. Only then can we begin to really get into enhancing the illusion we are creating in a really meaningful way.

With that in mind, we’ll dive into the weird and wacky world of blind testing next month. Thanks for listening.
Note: The following group of columns that I wrote for TV Technology are an attempt on my part to describe some of the issues surrounding our attempts to measure and evaluate the audibility of high-resolution formats. Together, I think they make an excellent short survey of these issues. I hope you find them useful.

COMMENTS

     May 01, 2006 05:46 PM
This site is like finding my perfect Les Paul. Incredible wealth of knowledge. Thanks for being open to midnight mixers like myself to explain why.
Random Jam 
UK     Mar 26, 2007 04:34 PM
Further, too many people I know whose hearing acuity I really respect have reported hearing things like a "BIIIIG" difference between 16-bit and 20-bit audio for me to say, "Nah, that's just group-think. They're making it up".

How do you respect someones hearing acuity? You can't use their hearing ability, you can only use your own. So the required respect couldn't come from experiencing that persons engineering skills, because you'd be using YOUR ears, not theirs. And whether or not their hearing is good in your opinion (thus leading to respect) comes from hearing their work through your own hearing acuity.

It isn't possible to respect someones hearing acuity, because you have no way of experiencing that. I tend to feel its more a case that you respect them as people, which should not have any place in determining these kind of issues.

In other words: You fell for a different kind of emperors clothes syndrome.
A Person 
Grosse Pointe Woods, MI     Mar 28, 2007 11:17 AM
I've been trying to come up with convincing demonstrations related to high sample rates and long data words. You can find my efforts here: http://www.pcabx.com/sample_rates/index.htm
Arny Krueger 
     Apr 02, 2007 11:22 AM
I respect their hearing acuity because I've witnessed them working with their ears. I think that is valid. Nonetheless, you raise a good point - we all hear with our own ears and we have to learn to trust them, and calibrate them as best we can.

In general, what you suggest is probably true. Others' claims only reveal to us that somebody else "claimed" to hear something, and "may" have heard it. Beyond that, we don't really know anything.

Best regards,

Dave
Dave Moulton 
     Feb 08, 2009 06:53 PM
Thanks, David!
Feels like the preasure levels become more transparent:)
Best wishes.
Dainius
Daains 
edinburgh     Dec 18, 2009 08:39 PM
Dave like the site good work; a little concerned about how to perceive the application of sample rates, why are you discussing sample rate frequency by comparing it to audio frequency in terms of pitch and human acoustic perception? That is not the actual use of the additional bandwidth surely? I believed it to be (for one purpose) a useful solution that provides (for example) DAWs with room to complete the complex algebraic summing that is required of them when mixing a great many channels of audio data with additional audio effects, plugins and other source data that may add to the complexity of what is essentially an incredibly complicated waveform. If you were to try and plot the same graph using cm squared paper it would render a much less detailed graph, is that not the point of the additional sample-rate resolution?

Kind Regards

Rob
Robert Henderson 
Groton, MA     Dec 20, 2009 11:51 AM
I think you've got a good point, Rob, and a useful way of looking at the issue. Particularly, the extra bits help with arithmetic resolution and the elimination of errors. Similarly, extra bandwidth gives us many fewer errors at the boundaries of bandwidth that ARE audible. Along with that, as you note, the extra resolution makes it possible for DSP to function more effectively, and, yes, that seems to be audible. Keep in mind, though, it is at the expense of the total amount of DSP that can be applied. At the time of writing, that was still a fairly significant issue, particularly working in multitrack surround work.
Dave Moulton 
London     Apr 29, 2011 09:20 AM
Hi Dave,
really love all the material you've put up for discussion and thank you very much for opening it out to others.
I completely agree with the fact that the issue of increased sample-rates doesn't come down to the issue of added audible high frequency content in a signal; even if we could hear above 20kHz few playback systems actually reach such spectral fidelity. The things which mainly fall into the discussion are temporal resolution and bandwidth limiting of a system.

Firstly a perceived increase in 'clarity' could possibly originate from greater time resolution of 0.0226ms (1/44.1kHz) to 0.0208ms (1/48kHz); before anyone goes 'Ah ha, that's it', little research carried out on the cognitive differences in timing of audio stimuli has shown a temporal resolution of the ear to reach only 2 or 3 ms. Any papers I've read show no evidence that the ear/brain could detect a change as small as 0.0018ms / 1.8 microseconds. It's still some food for thought & if anyone can add to this side of the coin then please do.

From the brain to the analogue domain; there are advantages to increased bandwidths of a system. In all A-D conversion the anti-alias brick-wall filter that is employed to reduce signal content exceeding Nyquist/2 can result in mismatching digital signal levels to the analogue source; when high frequency components are removed from the complex signal entering the system, a variation in the relationship of the signals components can result in higher peak levels due to removed interference caused by the high frequency components (this isn't referring to differential non-linearity distortion associated with A-D or D-A conversion where finer absolute level changes aren't perfectly reflected in the digital form of the signal).
Backing off the brick-wall filter reduces this variation to a degree but another facet is the inherent time-smear artifacts of filters. When any system has a reduced bandwidth, often achieved through filtering, then the temporal resolution is inversely proportional to this new bandwidth: narrow bandwidth = less time resolution. When 44.1kHz is used it could be seen, in a badly designed converter, that these inherent time-smearing artifacts are within the audible range and using higher sample rates moves these distortions into a frequency range that doesn't interfere with our judgment of the sound quality. In a related aspect and something that Bob Katz & many other engineers believe to be the case, a converters audible quality is down to the flatness of the filters passband and no difference is caused by added high frequencies due to higher sample rates.

When the issue of DSP is carried into the discussion then any increase in accuracy, be it sample rate or bit depth, results in more accurate computations after processing and summing has been carried out. The only issues that limit DAW operations is processing power but still the vast majority of calculations carried out on audio nowadays are done at longer word-lengths to the source data for increased accuracy and reduced artifacts.

It can be very easy to become bogged down and obsessive over increasing the resolution of our audio however this is mostly born out patching understandings and overly eager equipment manufacturers. Any practicing sound engineer knows that at the end of the day a file recorded at 96kHz 24-bit will eventually be subjected to the medium limitations of CD 44.1kHz 16-bit or the even more limited MP3 320kbps and general resolution of consumer playback systems. Once this last part of the audio chain comes into play, little or no audible benefits can be appreciated by the listener from recording the source at excessive sample-rates and/or bit-depths. One thing that plagues me is the obsession within the audio industry of increased dynamic resolution of audio systems beyond 24-bits...when it's hard to see any reason in this as the majority of modern commercially released music uses only about 8 bits of this range to deliver the message.

This is an issue always worth questioning and my understanding of the vast amount of variables in the discussion has just begun to formulate & it's likely I'm wrong in many aspects but it's always good to discuss such things in our wonderful realm of audio.

Thanks again Dave!!
Darren T. Jennings 
Jamaica     Jul 21, 2011 04:31 PM
We might not be able to hear the .0018ms difference in a single instance but summed up in even a snapshot of a multitrack Mater Bus will become very noticeable.
Evon Brown 
Virginia Beach, VA     Oct 15, 2011 12:36 AM
@Darren and Evon

The time resolution of a 16 bit, 44.1khz PCM channel is not actually limited to the time difference between samples (as would be suggested by the 22.7µs figure you site, which is incidentally very commonly cited for arguments of "why CD doesn't cut it"), but thanks to the function of sinc interpolation in PCM (and under Nyquist-Shannon), the time resolution which can be captured is equivalent to 1/(2pi * quantization levels * sample rate).

In the case of 16/44.1, we end up with 1/(2pi * 65536 * 44100), which is right on the order of 55 picoseconds. That's 0.000000000055 seconds. The human ear cannot distinguish separate events in that amount of time. Light, which travels at 186,282.4 miles per second in a vacuum, does not even travel a full inch in that period of time, so perhaps that sheds some light on the amount of time resolution a 16/44.1 PCM system is truly able to capture.

Also, the idea that an antialiasing filter can actually modify the level represented in the medium is new to me. Would you mind elaborating on that?

At any rate, thanks for the article Dave. Great work, as always. smile
Sean 
     Feb 02, 2012 12:06 PM
Here are some potential problems that I see with saying that 24/96 is going to sound better. Obviously there is a myriad of AD or DA converter processors on the market and an even bigger number of companies selling outboard converters or CDs, DVDs players etc, Well, what happens of one brand DAC sounds different from another. what about playback systems being able to properly playback these subtle differences. has anyone really addressed inherent problems most sound systems have? Pre amp distortion, crossover distortion, various types of distortion from the current catalog of 16 bit CDs, etc.?
Rich Davis 
Grosse Pointe Woods, MI     Feb 03, 2012 07:44 AM
"...has anyone really addressed inherent problems most sound systems have? Pre amp distortion, crossover distortion, various types of distortion from the current catalog of 16 bit CDs, etc.?"

I see no mention of the true and genuine 5,000 pound gorilla(s) in the picture - the room, the speakers, and the problems that were present prior to encoding the music on whatever digital media being used.

I'm a professional recordist and FOH mixer, producing both minimalist and multi-miced musical productions and festivals. I have no illusions about the size of audibl differences among rooms, transducers, and positioning of transducers - they are huge.
Arny Krueger 
     Feb 04, 2012 12:17 AM
What I find funny is that the average person doesn't know what good audio is. Even those in the music industry. most of the music produced is over compressed, auto tune, overdubbed BS. Musicians aren't using real instruments as much as they used, now we have sampled instruments, etc. and instead of listening to music in the same manner as which it was recorded, we are more or less forced into MP3. Even a lot of the converters being used, the quality of the monitors (NS-10 more specifically) are destroying the music industry with bad audio,
Rich Davis 

Post a Comment



rss2

rss atom