Moulton Laboratories
the art and science of sound
Hearing: The Highs and the Lows of It
Originally published in Recording, in early 2001
by Dave Moulton
February 2001
2. Detecting Pitch

Dave attempts to explain how we perceive pitch, to an alien.

< 1 2

Detecting Pitch

The inner ear, called the cochlea, has a remarkable thin compliant membrane stretched out along inside it. This membrane is called the basilar membrane, and it has two really interesting features – first, it is resonant at different frequencies at different points along its surface, and second, it is infused with thousands of nerve endings (called hair cells) that are attached to the auditory nerve going to the brain.

The net effect of these two features is that different frequencies excite different hair cells, so that in a general (if incomplete) way we can think of the mechanism as causing each hair cell to be excited by a very specific range of frequencies. This leads us to a concept called the “place” theory of pitch detection.

The idea is that “pitch” is related to a “place” on the basilar membrane. A given place has a given pitch, or so the theory goes.

Obviously, there’s a problem with this theory. In our examples above, different places on the basilar membrane would be excited, but all yielding the same pitch sensation. Nonetheless, it’s worth asking the question: what do all of these different waveforms have in common, on the basilar membrane?

As I said in Total Recording (sort of edited here), “Complex sounds consist of multiple frequencies, which in turn cause multiple points on the basilar membrane to vibrate. . . Here’s where the concept of “neural templates” comes in handy. If we think of each pitch as a unique array of “places” on the basilar membrane that vibrate at a single time, it makes a little more sense. Think of that array of places being stored in our memory as a single neural pattern. Then, when we hear that pattern, we are able to successfully compare it to the template and say something like ‘Ah-hah! That’s neural pitch template B-below-Middle-C!’”

And here’s the cool part – to continue from Total Recording: “We don’t need the ‘array of places’ to be exactly like the neural template in order to make the correct pitch identification; we simply need it to be a close enough ‘fit’ that we FEEL that it has the same quality of ‘pitch.’ In fact, the perceived array can be significantly different than the template and we can still identify the pitch as the same. It only falls apart when ‘places’ are excited on the membrane that are ‘incompatible’ with our pitch template (i.e. being offset just a little, out of the pattern, or having a significant portion of the array seem to resemble a different pitch template).

“Using this concept, then, we can think of a pitch as an ‘array of particular places’ on the basilar membrane. If we hear only a sine wave, we are exciting only one of those “places,” and intuitively, using this concept, such a sensation is probably going to be a little vague. And the reality supports this: the sine wave is comparatively difficult to assign pitch to.”

So, Zork-11, that is roughly how we think we detect this stuff we call pitch.

“But,” Zork-11 quickly interrupts, “what about these things you call overtones and chords? How can they both exist at the same time? How can your basilar membrane tell the difference between a pitch template and a chord? They are both stacks of frequencies, right?” Zork-11 is getting agitated.

What Makes A Waveform a Waveform, Not A Chord?

A fundamental concept related to frequency is phase, which can be thought of as the progression of the waveform through its period. Now, any two vibrating sources (except loudspeakers, curiously enough) will vibrate at different frequencies, even when they are extremely closely tuned. The result is that they will go in and out of phase over time. This leads to the phenomenon of beating, and it is a primary musical behavior. It is also inescapable: any two separate vibrating sources (except loudspeaker pairs in mono) will always vibrate at at least slightly different frequencies.

Meanwhile, a single vibrating source will have some waveform, which in turn is constituted of a stack of frequencies. Now, and this is the key part, Zork baby, for that waveform to be periodic, to repeat over and over, all of the frequencies in that waveform have got to be locked in phase. If they aren’t locked, the waveform falls apart and constantly changes. When we consider a square wave, for instance, all of the frequencies cross zero going positive at exactly the same instant in time, for each cycle of the wave. We call this behavior “phase-locked.” It is an essential ingredient of the concept of waveform.

So, when we have a complex wave, with a stack of frequencies, they are ALL phase-locked, so that for each cycle of the wave, they all maintain the same phase relationships. This phase-locked behavior is the unique signature of a “single” sound source – it is how we know that we are hearing “one” sound, and not a group.

Exactly how we do this, Zorkarino, we don’t know that either. My guess is that it occurs in the auditory nerve, as a function of the transmission from the basilar membrane to the brain.

Anyway, the effect is profound. Try manually tuning a bunch of unsynchronized analog sine wave oscillators to mimic a stack of overtones: 250, 500, 750, 1,000, etc. It will be painfully obvious that you are listening to a “bunch” of oscillators. The difference between those oscillators and a single sawtooth waveform at 250 Hz. is huge. The sawtooth wave is obviously a single sound, while the bank of oscillators is oh so obviously not (even when they all emit from the same loudspeaker)!

So, regardless of exactly how we do it, here is the straight skinny, Zorko, about the difference between chords and overtones. Overtones are phase-locked, and represent multiple frequencies from a single source, while chords represent a group of sources, each performing a different pitch in the chord. Obviously, if all of the sources have complex waves, it all gets even easier to understand and hear out the chord.

What does this have to do with pitch?

Pitch is detected as a pattern of phase-locked frequencies on the basilar membrane, a range of “places” that are “correctly spaced.” We memorize templates for these pitches – perfect pitch can be thought of as really good long-term memory of very exact “places” and “spacings.” If the frequencies are not phase-locked, they aren’t included in the template. They are, for us, “other” sounds.

So, Zork-11, there you have the basics. Pitch is a subjective sensation of the highness or lowness of a sound. Depending on the relative loudness of the various overtones, it can be brighter or darker. Regardless of its timbre, it has a given “height” in the musical scheme of things.

This all works because of the remarkable combined capabilities of the basilar membrane to detect individual frequencies, the auditory nerve to detect phase-locked vs. free-running frequencies, and our brains to make coherent sense of what is, after all, a very confusing situation.

Next article, we’ll consider loudness, which has Zork-11 even more confused. Uh-oh!

Happy highs!

Dave Moulton is alive and well in Groton, MA. You can complain to him about anything at moultonlabs.com.
< 1 2

Post a Comment



rss2

rss atom