Part II
More About the Words We Use To Describe Sound
Alert readers may recall that last month I discussed some research I’ve been involved with about the “meaning” of various words when used to describe microphones. When I first got into the study, I thought it would be interesting to see the correlations between various words and the physical behaviors of the microphones. But I didn’t think much about the question of what the various words really mean. I just assumed that “warm” means, well, you know, like “warm.” However, as I’ve gotten deeper into this test, this part of the question has emerged for me as profoundly important. Are the meanings of metaphorical words reliable enough to be used as technical communications?
An Expanded Discussion of Words
When we describe visual stimuli, we have a rich array of words to choose from which refer quite directly and specifically to specific visual characteristics. “Blue,” for instance, refers to a specific band of light frequencies, and when we use the term, we all mean pretty much the same band of frequencies. No one will confuse “red” frequencies with “blue” ones, for instance, although there might be some confusion about where the boundary is between “green” and “blue.” This makes it easy to talk about what we see. We can describe a visual sensation easily and with considerable precision, as in the sentence, “The Ferrari was a bright red blur.” Aside from the fact I cheated by using an associated color metaphor (Ferrari, which involves a quite explicit shade of red), “bright red blur” invokes a very specific visual image, specific in shape (an abstract blur), wavelength (Ferrari red) and luminosity (bright).
When we try to describe the same image in terms of sound, it is much more difficult. We have no specific sound words to describe the sound “Ferrari.” We can borrow some words used to describe speech behaviors (“the Ferrari howled by,” “the Ferrari screamed by,” or perhaps “the Ferrari muttered by”), but these tell us little about the specific physical nature of the sound. We have a modest array of words that suggest various loudnesses, and we can talk a little about high and low-pitched sounds, but that’s about it. We have no words to describe the specific sound character of the Ferrari, of a violin, or any other specific sound source. Tone quality, or timbre, is extraordinarily multi-faceted, complex and variable by nature, and not amenable to verbal description.
So, we have to make up words. Metaphors, we call them. We borrow them from everywhere, and fairly casually apply them to sound. They may have quite specific meanings elsewhere, but those meanings usually have little relevance for sound. This is what raises the uncomfortable question of technical meaning. When I use the word “fat” to describe a sound (as in “The tenor saxophone sounded fat”) I may have a very specific quality in mind. It is reasonable to wonder if the use of that term conveys what I mean to the listener or reader.
Why is this important? For consumers, I don’t really think it matters too much, but to us audio professionals, it sure does. We talk about our work, describe our tools, characterize our results, etc. using these made-up words all the time. We evaluate, ask for, demonstrate and describe our sounds using a bunch of metaphors that have no specific, literal sound meaning. When a client ask me for a slightly “edgier” sound, I have to guess at what he or she really wants, because sounds don’t have edges.
Soren Bech’s Test Method
I just got back from Denmark, where Soren Bech of Bang and Olufsen has been conducting some research into humans’ perception of spatial qualities of sound. He is using an interesting approach to this word problem – he has his listening panels make up their own word lists. The panels meet repeatedly to consider their listening task and to create word lists to describe the features they wish to evaluate. These lists become refined over time, and the individual words come to have very specific meanings to the listening panels.
The result is that the test results show high reliability over many tests and for many listeners, results that generally correlate well with specific physical behaviors. This is very useful for product development. Using such listeners and their developed words, the manufacturer can reliably evaluate the performance of prototypes for specific behaviors with considerable precision.
At the same time, such words may have comparatively little relevance for the world at large. As the professional listeners become more skilled and comfortable with their word lists, their perceptual performance is going to diverge from that of either general listeners or audio production people, and it becomes less and less reasonable to expect that what they mean correlates with what other listeners might think they mean (if you know what I mean!).
Some Tentative Findings From Our Tests
Meanwhile, I’ve been working with Eric Reuter and Ben Findlen of Worcester Polytechnic Institute on a study of an array of microphones, attempting to correlate their physical behaviors in the time domain with various described words that listeners might use to describe how they sound. As I reported last month, our first test yielded the insight that the words “warmth” and “brightness” have pretty robust meanings for audio professionals across the boards, while the term “depth” does not.
We continued this study with additional listeners drawn from the 1998 Parsons Audio Expo. We kept “warmth” and “brightness,” dropped “depth” and added “definition,” “clarity,” “richness” and “presence.” In the test instructions, I added a series of non-audio suggestions to help the listeners with their task. For instance, I suggested the word “clarity” could be thought of as “a feeling of transparency, of being able to hear the inner timbral details of the recorded instrument,” while “warmth” could be thought of as “a feeling of rich intimacy, fullness and sweetness of sound.”
We tested these terms by having the listeners compare brief samples of single-instrument recordings made by the various microphones under test (in pairs), as in the first test. Unfortunately, in an effort to shorten the listening time for each listener, we didn’t test all possible pairs for each listener. So, although we had lots of listeners, we didn’t collect as much data in total as we would have liked to, simply to satisfy the limitations of the testing situation. We had hoped that the Rasch Model would help us to infer what the missing pairings would “probably” yield, because it can do this to a significant extent.
In fact, our data turned out to be sufficiently thin that we have some severe reservations about the results, in spite of fairly good correlation with the earlier test. When we look at the rankings of microphones, they make some good intuitive sense (an elderly well-known dynamic microphone design seems to score highly in terms of “warmth” while a high-precision wide-bandwidth omni condenser microphone scores highly in terms of “definition”). However, when we look at the unidimensionality of meaning of the various words, the Rasch Model tells us that the microphones don’t spread out meaningfully along a scale. “Warmth” is the only term, so far, that performed well on both tests. This suggests either that the rest of these words don’t really have much universal meaning or that we collected insufficient data to make a meaningful comparison. My personal guess is that it is some of both. It seems reasonable to me that we will be able to find a list of five to ten words that have some real universality among listeners, and that the rest will turn out to be unreliable.
We will continue with this study. Because this is a non-profit effort at this point, we may have to ask for volunteers. If you would be interested, send me an e-mail. I’ll probably make up CDs of the test material and send them out to those of you who will be willing to spend the three hours the whole test requires. Thanks in advance for your help!
And thanks for listening!