Moulton Laboratories
the art and science of sound
Hearing: The Louds and the Softs of It
Originally published in Recording, approx. May 2001
by Dave Moulton
May 2001

Where Dave tries to explain loudness to an alien.

The View From 2009: This article was another in the “Zork the Alien” series. This one’s about the perception of loudness. Enjoy!

The Louds and Softs of Hearing

Zork-11 (our friendly visiting alien) demands to know about loudness. Recently, she’s been homesick for Betelgeuse (her home star system), and that makes her, well, a little impatient! Cranky, you might say. So, it’s important that I try to humor her, and explain about loudness, before she does another photon discharge.

Well, Zork-11, the quick answer is: loudness is the subjective perception of the magnitude of air-pressure fluctuations in the air, roughly speaking.

It is how us humans perceive the energy and power that are being transmitted through the air. The more energy and power, the louder the sound. Right? Got that, Zork-11?

Pressure, Power, Amplitude and Loudness

What happens physically is this. Some kind of mechanical physical motion excites the air. The more energy and power that are expended by that physical motion, the greater the swings in air pressure from compression to rarefaction. This, in turn, causes greater motion of our eardrums, and the result is a change in what we hear, a change in the particular quality of sound that we call “loudness.”

There is an approximate correlation between the subjective quantity loudness, the physical force called pressure and the amount of work accomplished per unit of time, called power. In a general way, we can say the following:
  • Power changes as the square of the changing pressure (doubling the pressure quadruples the power).
  • Tripling the pressure multiples the power by 10 (which is the same as adding 10 dB, by the way).
  • As the power is multiplied by ten, the sensation of loudness approximately doubles (it sounds “twice as loud”).
  • Therefore, tripling the pressure approximately doubles the apparent loudness.
Pretty straightforward, eh, Zork? She warbles in agreement.

Zork-11 Ponders The Hugeness Of It All.

It gets interesting where we begin to contemplate the huge range of loudnesses we can perceive.

Truth is, we can actually hear something like 15 or 16 doublings of loudness. What this means is that the loudest sound we can stand sounds about 50,000 times as loud to us as the softest sound we can hear. And that represents a change in pressure of say, 14 or 15 TRIPLINGS of pressure, or a pressure range of 8 million to 1! It may not seem like a lot to you, Zork, coming from Betelgeuse (which is a REALLY BIG red star) and all, but that’s a big pressure range for us humans. Why, we have trouble even imagining it!

And the power ratio is REALLY big! 8 million squared is, like, 65 trillion! That’s a heap o’ watts!

So, Zorka, the key issue with loudness isn’t how we perceive it, but how we manage to perceive it reliably over such a huge range! Back when we looked at the audio window, if you recall (you WERE paying attention, weren’t you, Zorkina?), we spent some time considering the physical implications of that range. At the soft end, we are very close to being able to detect the sound of a single air molecule bouncing into our eardrum, while at the loud end of things we can ALMOST tolerate the sound of airwaves creating vacuums. So, in terms of loudness, we can detect levels pretty much all the way across the useful range of sound pressure levels. And that range is huge! Think about it, Zorkova!

And what makes it really remarkable is how constant the rate of perceived loudness change is. A change in amplitude (or power) of 1 dB (12% amplitude change) is roughly the smallest loudness change humans can pick out. And that 12% threshold holds up pretty well across the whole 8,000,000:1 range! Wow!

How We Perceive Loudness

Meanwhile, the multiple processes via which we perceive changing loudnesses are highly complex and definitely non-linear. There is the eardrum to consider, where sound is converted into mechanical motion. Then there is a linkage of bones in the middle ear that serve to amplify the mechanical signal and transmit it to the inner ear (the cochlea), where the mechanical signal is converted into a compound array of neurological signals being sent to the brain. We discussed roughly how this works in our last article in the series (Hearing Highs and Lows). Suffice it to say that on the basilar membrane in the cochlea, sound is detected along a resonant membrane, so that different parts of the audible spectrum are detected at different places, in an array of approximately thirty non-uniform critical bands. Meanwhile, the basilar membrane is infused with thousands of nerve endings, which respond to motion in their vicinity. We already discussed this, Zork! You’ve GOT to pay attention! Forget Betelgeuse for right now!!

The obvious part of this is that as sounds increase in amplitude, two things happen. First, the RATE at which the nerve endings fire neural impulses increases, so that loudness increases (roughly) with the general firing rate of the nerve endings. Second, the area of the basilar membrane which resonates will become greater, and therefore, more nerves will fire. Therefore, at the basilar membrane, loudness correlates to the neural firing rate AND the total number of neurons firing.

“But wait a minute,” says Zork-11. “How many nerve endings are there, really? You said thousands, right?”

“Right,” I reply. “Around 30,000, I think, but a lot of them actually send nervous impulses TO the ear FROM the brain, so I’m not sure they all count. Figure 15,000, maybe?”

“But you said that humans can hear amplitude over an 8,000,000:1 range,” Zork-11 exclaims, her hydrochlorinator rasping slightly. “How can you do this with only 15,000 nerve endings? 30,000 even?” Zork-11 HAS been paying attention after all.

This Is Where It Gets Complicated

There are a couple of things to keep in mind. As amplitude increases, the NUMBER of nerve endings affected increases geometrically, so that a small change at high levels involves a much greater extra number of nerve endings than a similar small change at low levels. The number of nerve endings on the basilar membrane and the range of firing rates is insufficient to do the whole job of distinguishing loud from soft across the dynamic range we are talking about. Other mechanisms must be at work as well.

Part of the answer has to do with the concept of critical bands. These are the various regions on the basilar membrane (approximately thirty of them, varying in width from about two octaves at the bottom of the spectrum to 1/6th of an octave in th middle – see the Figure below) that each seem to function approximately as a unit, so that each such band carries independent information. The sensation of loudness, it turns out, is affected by the NUMBER of such critical bands that are excited by a sound source. That’s right, Zorkia, as the spectrum of a sound increases, its loudness does too, even if its amplitude doesn’t. Pink noise at 80 dB SPL will sound louder than a sine wave at 80 dB SPL.
  
Figure 1. Critial Bands acoss he audible spectrum, courtesy of IzoTope.

Similarly, time affects the loudness as well. A very short impulse will sound 20 dB softer (i.e. 1/4 as loud) as sustained noise at the same level.

A third issue, one that I’ve been fooling with a lot recently, is distortion. It is a truism in audio that loudness increases as a function of distortion. This is probably related, of course, to the critical band process I mentioned above, but I suspect it’s something more as well. The perceived onset of even quite low-level distortion components seems to increase loudness quite a bit, to a point where I’ve speculated about the concept of “virtual loudness” based on the addition of some non-linearities (er, distortion) to the audio signal.

What all this means is, of course, that loudness varies in some really complex ways. Time, spectrum and linearity all affect how loud a sound is, independent of its amplitude. So, that simple correlation between amplitude and loudness that I started out by describing falls apart pretty quickly, and can only be thought of as true for signals with similar spectra, linearity and impulse content (we call it “crest factor”).

And This Is Where It Gets REALLY Complicated

It turns out that, in addition to all of the complexity of the basilar membrane related to number of nerves firing, etc., we have two compressors in each ear! First, the eardrum itself, functioning as a muscle, serves as a slow-acting compressor by changing its compliance in response to different perceived amplitude levels. Second, the three bones that transmit motion from the eardrum to the cochlea function both as an amplifier stage and also as a comparatively fast-acting limiter (the bones slip apart, but are held in approximate place by bone cartilage) at high levels. In combination, these two mechanisms provide something up to 30-40 dB of variable level reduction, depending on the nature of the sound.

What’s wild about these compressors is that we have real trouble hearing them work! Even though they can turn down the mechanical level reaching the basilar membrane by up to 30-40 dB, they don’t seem to reduce the loudness we perceive! I’ve tried to hear them, believe me, listening to all kinds of sounds in all kinds of environments. So, I suspect our brain gets into the act as well! My guess is that as the compressors in our ears pump away, regulating levels on an on-going basis, our brain compensates for this regulation, so that the “apparent” or “perceived” loudness doesn’t change. This implies some sort of elaborate interactive feedback/feedforward process that turns down the mechanical level in our ears while it turns up the “mental” level in our consciousness! Whoa, Zorka! Whaddaya think of them space-balls, eh?

Zork-11 shriggles her pleasure at the concept.

Not only that, but the brain also feeds back neural information to the basilar membrane to condition and “sharpen” low level signals, helping us to hear them and reducing “neural noise” on the membrane.

The net result is that we have an apparently effortless and seamless perception of loudness from approximately the level of single air molecules bouncing in space right up to the onset of Armegeddon III. The perception is smooth, compelling and extraordinarily reliable. In many respects, our hearing blows test gear away, for its speed, reliability and noise reduction, not to mention bandwidth and resolution!

Meanwhile, About Levels In The Acoustic/Analog/Digital Realms

This is a good part of why audio gives us such fits. Our hearing capacity, although complex, variable and multifaceted, gives every appearance of being very smooth and straightforward over a truly gigundous range. To help put this in perspective, I like to think about audio signals in relation to our human hearing limits – it can be both instructive AND entertaining to contemplate.

We have this lowest level that we can hear (actually, the level varies a great deal with frequency – we’re talking 1 kHz. in this particular case) called the “threshold of hearing,” which physically is approximately 2/10,000ths of one millionth of an atmosphere. It’s also called 0 dB SPL. Let’s take a look at what would happen if we aligned the noise floors of analog and digital signals to coincide with that threshold.

The noise floor for analog signals is, approximately –100 dBu (actually it may be easily 10 dB amount higher if we factor in the equivalent noise floor of a good, very quiet microphone!). This would mean that 0 dBu would yield a level of 100 dB SPL. Meanwhile, audio systems with power supplies of +/- 24V can reproduce peak levels of up to +27 dBu. Therefore, they could reproduce a source signal of 127 dB SPL. Not too shabby.

Meanwhile, if we fix the Least Significant Bit (dithered of course) of a digital signal to that same 0 dB SPL/-100 dBu level, a 16-bit system will run out of bits at -4 dBu and 96 dB SPL. A 20-bit system will make it all the way to 120 dB SPL and +20 dBu, while a 24-bit system will make it to 144 dB SPL (4 dB above the Threshold of PAIN!!!) and +44 dBu (156 Volts RMS!).

Meanwhile, assuming loudspeakers with a sensitivity of 1 Watt yielding 90 dB SPL, the 16-bit signal will only call for 6 dBW (4 Watts) for playback, but the 20-bit signal will require 30 dBW (1 kiloWatt) and the 24-bit signal will require 54 dBW (250,000 Watts)! I tell you, Zorkana, it’s a tough business!

At present, we don’t formally align the relationships between these levels. In many respects, we leave them badly mis-aligned, for some practical reasons. The closest we come to a formal relationship between acoustical and electrical levels is to occasionally try to calibrate our speakers so that an RMS noise signal at some “nominal” production level (usually thought of as “0 VU”) yields 85 dB SPL (this is a film industry standard). Meanwhile, we haven’t really fixed a relationship between that “nominal” production level and 0 dBFS (the maximum level yielded by the Most Significant Bit). Relationships vary between 0 VU = -20 dBFS and 0 VU = -6 dBFS! What we HAVE done is to align 0 dBFS with the onset of analog clipping (or, more elegantly put, the limits of the power supply), so that 0 dBFS, when converted to analog, is at the maximum level the power supply of any given piece of gear will attain without distortion.

When we consider the implications of this, it becomes obvious that we throw away much of the goodness of high resolution. Assume, Zorkalina, that 85 dB SPL is aligned to –12 dBFS (approximately +15 dBu with a 48 volt power supply). This means that the maximum system level (for one speaker) is 97 dB SPL, regardless of how many bits we’re using. Pretty shabby!

Meanwhile, the Least Significant Bit of 16-bit signals, at 1 dB SPL, correlates well with our threshold of hearing, while the LSB of 20-bit signals comes in at –23 dB SPL and the LSB of 24-bit signals comes in at –47 dB SPL. These both are WAY below our threshold of hearing, probably to a point of silliness. As I said, mis-aligned.

Why We Can Never Really Measure Amplitude Or Loudness Of Audio Signals

Meanwhile, Zorkinara, we can’t ever even REALLY measure the amplitude of a sound, much less its loudness. All we can do is approximate it, by some sort of averaging over time. Pressure, or amplitude, of course, is a static physical value, like pounds per square inch or dynes per square centimeter, and it simply exists at one or many points in time.

Meanwhile, sound comes from CHANGING the pressure or amplitude OVER time. Whatever measurement we make will either be inaccurate in the short term (if we make it accurate for the longer term) or vice versa. So, when we bandy about these various levels like dB SPL or dBFS, you’ve gotta remember, they are time-based approximations. Some of our approximations (like VU meters and their kin) are fairly long-term, while others (peak-reading meters, for instance) are short-term. The same sound will have different measured levels depending on the temporal behavior of the measuring system. Which is the “correct” level? None of ‘em! Damn!!

Meanwhile, loudness is REALLY subjective. It can’t really be calibrated with any precision, and when we make the statement that a doubling of loudness is equal to ten times the power, well, Zorkonica, that just isn’t quite true. It’s a gross oversimplification, so you’ve got to remember that humans are weird that way. We only sort of know what we’re talking about!

What Loudness Means to Humans

Finally, the subjective quality “loudness” has emotional “meaning” for us. It relates to issues of the emotional intensity of sound, as well as its nearness to us. Variations in loudness (dynamics) carry much, perhaps most, of the ebb and flow of emotional intensity in musical performance. While melody, harmony, tempo and timbre may tells us much about the emotional quality of the music, loudness and dynamics fairly directly suggest the “intensity” or “magnitude” of the emotional feeling at any given moment.

Recorded acoustic music seems to vary over about a 40-50 dB range (perhaps a 30:1 range of loudness), while an acoustical performance of a symphony orchestra might make it to 60 dB. Pop electric music has a much narrower range (15 dB?), and seems to focus on the LOUD end of things. Emotionally, pop music focuses on INTENSE! It’s a little limited in that regard.

If you wanna make music come out loudspeakers, Zorka-baby, loudness will be one of your primary tools. The management of loudness in music production is an extremely important skill. And to manage it well, you’ve got to come to grips with amplitude in all of its forms, including time and spectrum, as well as human expectation. And once you get the hang of it, it’s actually a lot of fun, and VERY satisfying musically.

Happy crescendos!

Dave Moulton is trying to determine the loudness of one hand clapping. You can complain to him about anything at moultonlabs.com.
Members
Login | Register
Mailing List

Post a Comment



rss2

rss atom