Moulton Laboratories
the art and science of sound
TV Audio IS Getting Better, And We’re Learning To Use Metadata Too!
Originally published in TV Technology in October, 2003
By Dave Moulton
October 2003

How television engineers (attempt to) broadcast audio signals intact from the control room to the end-user's set-top box.

The View From 2009: This article was a look at how the production professionals saw TV audio in 2003. Interesting.

TV Audio IS Getting Better, And We’re Learning To Use Metadata

A recent informal survey of some broadcast audio professionals I know paints a fairly bright picture for the state of TV audio as we continue the move into our brave new digital world. The benefits of digital audio (dynamic range, bandwidth, easy signal handling and reduced cost) are really having an effect.

Setting aside the questions relating to surround sound production, a number of good things have happened to audio over the past several years. “Our field acquisition has gotten a lot better,” says Dave Gardiner, audio engineer at WCVB-TV (the ABC affiliate in Boston). “Thanks to better RF mics and a digital videotape format in the field, the audio we’re getting back at the station is much cleaner now than it used to be. There’s simply less distortion. Some limiting, de-essing and a little conservative console EQ seems to do well for us wherever the body of the program is assembled. Also, the processing in our transmission chain has been simplified as well. We use some soft compression and slow-leveling set at a fairly high threshold, just to keep overall station program levels pretty consistent. Then it’s straight to the converters for digital transmission. There’s some additional processing for the analog transmitter, to meet FCC legal requirements. HDTV has no additional processing.”

Similar sentiments are expressed by Jim Starzynski, Principal Engineer for NBC commenting about WNBC-4 in New York City, “The analog audio signal we send over the air and to our cable audience has a minimal amount of audio processing applied to it. It’s really pretty simple. Audio for the network is transferred, recorded or dubbed with little or no dynamic range compression. This full bandwidth signal travels on digital routers and enters our east coast Hub control room for switching and from there goes to WNBC master control. The output of master control has a compressor/limiter that is set only to protect the STL for overload. Considering the STL is digital, the dynamic range stays mostly intact. . . . Proper gain structure, attention to relative levels and improved dynamic range of critical paths has moved us away from a one-size fits all processor and the audible artifacts that may have accompanied that technique.”

Jim goes on to note that for digital TV, things are different. “For HDTV we still need to consider perceived loudness and the processing necessary to achieve it. This can be done conventionally or with dialog normalization (DialNorm) and dynamic range capabilities (DRC) if they can be established and maintained in the signal path.”

Bob Dixon, Manager of Sound Design for the Olympics (NBC), has similar good feelings about audio in production and on its way out to the affiliates. He does have concerns, however, about what is happening at the set-top box on the end-user’s television, mainly due to its effect on and implementation of metadata. Bob notes that “cable companies have the ability to over-ride metadata, but mostly they leave it alone. However, it may not always be set correctly, particularly DialNorm, and so levels may be all over the place.” Dave Gardiner concurs with this, saying that DialNorm settings, while not complex at this stage of digital TV, are important to watch when transitioning from local to network programming.

There are several interesting aspects to this. First, as Dolby notes in the manual for their Model 569 Multichannel Audio Encoder, this “requires the producer to correctly set the metadata parameters because they affect important aspects of the audio – and can seriously compromise the final product if set improperly.”

Dolby gives an example: “In a broadcast truck parked outside a football stadium, the program mixer chooses the appropriate metadata for the audio program being created. The resulting audio program, together with metadata, is encoded as Dolby E . . . and sent to the TV station. . . . At the receiving end, . . . the Dolby E stream is decoded back to audio and metadata. The audio is monitored and the metadata is altered or re-created as other elements of the program are added . . . This audio/metadata is re-encoded as Dolby E, leaves the postproduction studio and [is sent] to Master Control, where many incoming Dolby E streams are decoded back to their audio/metadata programs. The audio program/metadata pair that is selected to air is sent to the transmission Dolby Digital encoder, which encodes the program according to the metadata stream associated with it, simplifying transmission. Finally, the Dolby Digital signal is decoded in the consumer’s home, with the metadata providing the information for that decoding process.”

What is of interest here is that the metadata can be manipulated at (at least) three separate points in the transmission chain. As Dolby says, it needs to be set correctly at each of those points.

To make matters more complex, the set top box’s behavior in regard to metadata is different at each of the box’s outputs. Additionally, the DialNorm level also serves as the threshold setting for Dynamic Range Control. This means that if DialNorm is incorrectly set, then DRC settings will be off as well, usually resulting in heavy limiting.

Michael Guthrie, an encoding expert at Harmonic Inc., has been studying this issue, and has found that the convergence of broadcast TV audio levels that occurred in the analog realm is diverging again while we learn how to use metadata. He has found that levels among channels swing as widely as +/- 15 dB. “Many engineers just leave DialNorm at the default level of –31 dBFS, assuming that is equivalent to unity gain. However, if the actual average level decoded is –20 dBFS (a fairly normal practice) then the audio will be 11 dB hotter, and 11 dB over threshold for Dynamic Range Control, which means it will be fairly heavily limited.”

Michael goes on to say, “The goal is for the viewer to always hear dialog at the same level. Dolby points out that viewers tend to adjust the volume to equalize the dialog level regardless of the level of the natural sound and effects. The -31 dBFS target for dialog allows for the other program elements to have enough dynamic range to allow for full artistic expression. The DRC circuits are there to reduce this dynamic range when it's inappropriate, so when everything is working the actual dynamic range the viewer is exposed to is always appropriate for the viewing situation. It all makes sense if you consider that most viewers are going to set the dialog SPL to between 60 and 70 dB (SPL). Sound effects and music then have 31 dB of range above this, leading to an absolute maximum SPL of 90 to 100 dB in the room. This range would be inappropriate for news, but completely appropriate for drama or action entertainment, or for peaks in a musical performance.

“These are the levels if DRC is not utilized, typical of a Dolby Digital surround system. Set top boxes that use RF or RCA stereo outputs or self contained sets typically use the DRC circuitry, reducing the peak levels, hence the dynamic range. The RF output would be heavily limited if the full range were used, while the stereo output would receive less limiting. In all cases the dialog level is untouched if and only if the broadcaster’s DialNorm matches its actual dialog level. The result is that viewers can use a variety of equipment in a variety of viewing situations and hear a dynamic range that is appropriate to their viewing situation, and still match the dialog levels between sources. Meanwhile, broadcasters have to satisfy two sets of needs, broadcasting with and without metadata. Both conditions have to work. It’s kind of like stereo/mono compatibility.” Guthrie believes that levels from different broadcasters will converge again, if the industry can get a handle on metadata and the cable companies begin to offer some feedback about the relative levels they are receiving, so broadcasters can adjust toward the norm.

From Jim Starzynski’s viewpoint, “If a complete metadata path is unavailable, perceived loudness is handled very similarly to BTS/NTSC. A limiter/compressor should be used to establish competitive and effective station sound that creates even loudness with adjacent stations and smoothens transitions from programs to commercials. . . . If a metadata path exists from the origin of the audio all the way to transmission, the creative staff can set those parameters and as long as the metadata path stays intact, the audience will receive those same settings at their AC-3 decoder.

“Either of these systems will work. Using the complete head to toe meta-data path ensures the creative choices made by production carries through to the audience. The DTV audio system was designed with this in mind. In its absence, a broadcaster relies on either conventional NTSC practices or uses DTV audio encoder values communicated by the creative staff. If neither of these practices are possible, a fall back to generic values can be substituted that yields satisfactory results.”

So, we’ve cleaned up one set of audio problems pretty well, particularly at the network level, and are busy trying to work our way through the next set. Metadata is an extremely powerful and important tool, and we have to learn to use it. I’ll devote some future columns to exactly that task.

Thanks for listening.

Dave Moulton would like to thank Dave Gardiner, Jim Starzynski, Bob Dixon, Michael Guthrie and Dolby Labs for their help. You can complain to him about anything else, at his website, moultonlabs.com.

Post a Comment



rss2

rss atom