Part IV: The Meaning Of It All
December 2000, TV Technology
For the past year, I’ve been blathering about the emergence of high resolution audio. As you can probably tell if you’ve been following the series, I think the virtues of such extended resolution have been oversold. This month, I plan to wrap this diatribe up and get on with my life and your audio reading.
In summary, we can make the following observations about high resolution audio:
In terms of bit resolution, it is not clear that the extra resolution, in and of itself and as we implement it, is audible at all. This is particularly true when we employ our traditional scientific measurement techniques. However, such measurements and resulting findings don’t satisfy the audiophiles and many recording engineers and producers, for both semantic and psychological reasons. This leads to confusion, ambiguity and many letters to the editor.
Audible differences, when they do exist, seem more likely to arise from peripheral issues, such as improved quality of converters, better clocks and power supplies, as well as somewhat improved DSP, than from the extra bits themselves.
To the extent that such differences are audible, they really ARE extremely small, and undetectable in most end-user listening environments. This is particularly true for television transmission and audiences, where end-user audio is mediocre in general and acoustic accommodations for high-resolution playback are non-existent.
In terms of bandwidth, the situation is even more dubious. Little, if any, evidence exists to support the proposition that bandwidth above 20 kHz. is audible to humans. Further, there appears to be a fundamental trade-off between bandwidth and noise that appears to limit us to about 25 kHz. AND/OR 16 bits of resolution at the microphone, which is the first and possibly most important transducer in the audio chain.
In production practice, an upper limit of about 15 kHz. seems to be the
de facto standard. Again, television audio represents a lesser condition than that, so that in general we can expect TV audio to be limited to about 10-12 kHz. at the end-user playback system. It’s not a pretty picture, I mean, sound.
As we forge ahead with HDTV and digital TV, these audio conditions are not likely to improve significantly (although the best end-user systems ARE improving, of course). The use of data compression (both Dolby and DTS formats) constrains us to something less than 16-bit/44 kHz. quality at best. That is not likely to change, particularly given the capability of surround or multi-channel playback media to mask the small defects inherent with such production resolution.
There’s another side to this, as well. The kind of literal physical accuracy that is offered by such enhanced resolution and increased bandwidth really isn’t at the center of our production quality goal. We have long since passed the point of diminishing returns for resolution. For me, 16 bit/44 kHz. resolution is really quite satisfactory. As I tell my students, if you can’t make a great recording with 16/44, there’s nothing I’m going to be able to teach you that’s going to help much when you start working at 24/96! The real audio values that we need to be concerned with lie within the realm of 16/44. Again, this is particularly true with television and film audio.
So, as the lady said, let’s talk, for a little while anyway . . .
A brief anecdote may clarify my view of this. Being both cheap AND a Luddite, I’ve come to DVD only recently. So there I was one recent summer evening, staring at my stock stereo TV with a DVD player driving it for the first time, watching a very recent big budget cop flick. Now, even in stereo, I could hear a BIG improvement in sound quality over either (a) typical live analog broadcast transmission or (especially, B) what comes off the typical VHS rental tape. However, that BIG improvement is still way less good than 16/44, nowhere near the quality I experience when I wander off into the studio and put up a good commercial CD on the main monitors.
Anyway, being a formulaic cop flick, this DVD HAD to have a shoot-out on the streets of downtown LA. So, at one point, over a stretch of about five minutes, various good and bad persons squeezed off about 200 rounds of various types and sizes of ammo out on Wilshire near the 110 (I think). And having really good digital sound, for the first time in my very own home living room, I could tell that each of those various gunshots seemed (a) to have the same source recording, and (b) the same reverb patch (a Lexicon 300 Large Hall with about .8 sec. RT, I think).
So here’s my point. We’re presently practicing our craft at a level of production quality that’s suitable for VHS, where the detail simply isn’t very sharp and there’s not much in the way of nuance. We haven’t yet begun to deal with the issues of variability of nuance, and the linkage of sound and image that even 16/44 yield for us. If we’re going to use even this ubiquitous mid-rez for the kind of heightened realism that is both possible and (I think) highly desirable, we’re going to have to rethink our post-producton process a little, consider the awful budgetary implications of the fact that the sound ambience outdoors out on Wilshire down from the 110 DOESN’T sound like a large hall, even a Lexicon large hall, and that a Smith & Wesson short-barrel .38 DOESN’T sound like an AK-47 assault rifle, and that neither sounds the same each time they are discharged as the various good and bad persons wielding them run frantically up, down and across Wilshire, the 110 and in and out of various doorways, alleys, and public parks.
I don’t mean at all to criticize the sound editors or producers of the above cop flick. I mean to suggest that our present resolution offers a level of realism that, should we choose to exploit it, could be both magnificent and terribly expensive. To my way of thinking, it is better to put our bucks here, into a kind of heightened sensibility and execution of sound/image coherence, than it is to obsess about largely imaginary extra bits and bandwidth that we aren’t capable of exploiting even if we could hear ‘em!
Happily, thanks to Moore’s Law, we now can do both (be careful what you wish for, as they say!). We’re all gonna be working in 24-bit within the year (or so), whether we need it or not. However, you can still save on your therapy bills – just set dither at the 16th bit and DON’T WORRY! If you really need to obsess, just obsess about how you can change the predelay, diffusion and RT for each gunshot hit as Al runs across the street, and maybe also get a different source for Robert’s Kalishnikov, so that we can all really tell, just by listening, who’s shooting at who, and where they are when they’re doing it!
Enough! Next month, I’m gonna start taking a look at some older, well-established technology, microphones and loudspeakers, and consider some of the BIG improvements that are happening with them.
Thanks for listening to all of this!
End Note: you’d think this would be the end of it. But, no. A reader wrote in to challenge me, and the guy just wouldn’t give up! Then Bob Dixon got on my case as a well. Here’s that whole exchange of letters as well.
comments: (1)