The View from 2005:
This is less of a problem now, because there aren’t very many mono radios out there any more, and comparatively few mono TVs. However, good studio craft requires that you at least CHECK to see if you’ve created any truly unacceptable sounds. And, unfortunately, this still happens all too often in live television broadcasts, particularly of sports events. Just so you know, when I re-read this article, I thought it needed some work (it was a little incoherent in places), so I’ve rewritten it a good bit.
From Stereo to Mono and Back
It should be simple
It should really be very simple. When you mix a tune in stereo you should have stuff coming out of the left speaker, the right speaker, and from in between (the phantom image). When you need a monaural version of that mix you should be able to just mix the left and right signals together and have
everything come from the middle. Maybe the mix might be a little more cluttered, just like making everybody in the band stand exactly in the middle of the stage, but there shouldn’t be any big problem, right? Like I said, it should really be very simple. Well, it isn’t.
But, why worry? These days, everybody has stereo and nobody except Aunt Tillie (who is still fixated on Pat Boone, who was singing over the car radio the first time she . . . well, never mind) ever listens in mono. This should be a dead issue. Further, stereo vinyl discs are hardly ever made anymore, so the problems associated with stereophonic stylus motion are no longer a concern either.
The problem is broadcasting, as manifested by two common items: your basic table radio and your basic television. The FM broadcasting standard provides for a mono summation of Left and Right for any single channel receiver (we’re not gonna discuss how they do it here), and right now a lot of listening to music still occurs over these systems. So, this is why you should worry: if you want your music to sound good when it is broadcast, oh, say, as a Top 40 single, or maybe as part of an MTV music video, then you better make sure it sounds OK in mono. If you don’t care about such broadcast situations, then you can forget worrying about how the mono version of your latest masterwork sounds.
However, assuming you do want to worry, the mono compatibility problem arises because stereo is an illusion, and the components of that illusion don’t necessarily mix together very well into a single signal. I’ve dealt with some of these issues in other articles (
The Phantom Image,
Early Delays and
Comb Filtering) and I won’t go through them in detail here. But it is helpful to consider briefly the elements of a stereophonic signal, and get a feel for where the trouble spots are. Then we can talk about what to do about them.
Early Techno-Nerds discussed stereo in terms of A and B instead of Left and Right (don’t ask me why), and I’ve gotten in the habit of using these terms too. So please humor me, and you too can use these terms and impress neo-Early-Techno-Nerds and the inevitable Wannabes with your snappy command of jargon. Anyway, stereo can be thought of in a different way: A (that’s Left to you), and B (that’s Right) can either be added together to make A+B (that’s mono, the Phantom Image that appears between the speakers), or B can be subtracted from A to make A-B (which is the
difference between A and B, which can be thought of as the “stereo-ness” of a signal). A, B and A+B don’t present any big problems, and if that’s all you have in your mix, then your mono mixes are going to be pretty compatible, and are going to present no particularly hairy surprises. However, such mixes don’t sound very, well, er, spacious. It is the A-B component that provides the stereo glue that makes the stereophonic illusion so powerful, and once you’ve become addicted to it (not me, nosireee), mixes without a significant A-B presence just don’t cut it any more. Back in the good ol’ days, people even generated a cheap and quite effective ersatz quadraphonic sound by sending A-B to a surround speaker in the back of the room. You too can do this, now, in the privacy of your very own homes. I’m not making this up.
By 2005, I’ve learned a good bit more about A-B, thanks to my loudspeaker research work with Sausalito Audio Works and Bang & Olufsen. Assuming good high-frequency dispersion and a reasonably symmetrical listening room, the pure A-B signal will tend to appear behind us, or at the very least out to the sides. It is the basis for a lot of simulated surround techniques.
So just what is an A-B component? What makes it special? How does it differ from A+B, etc.? Imagine a snare drum sound, a spike of energy at a point in time that decays rapidly. If we record such a sound and then send it through a console panned to the left channel of a stereo playback system, that snare sound is by definition an A signal and comes from the left speaker. If we pan it to the right it is a B signal. If we pan it to the middle, it is an A+B signal, because it is the sum of all that is being sent to Left and Right. All very obvious and straight-forward. However, if we make it an A-B signal, we have to
subtract B from A. To do this, we invert the voltage polarity of the B signal (usually by pressing the “phase”, “Ø” or “invert” button on a console module -- there are also other ways to do this, if you are handy with a soldering iron and/or op-amps, etc.), so that all positive voltages become negative and vice versa, and
then we sum the two versions. And that signal, sports fans, is the A-B signal.
But there’s more. If we
don’t sum A-B, we get the signal(s) A,-B. In this case, that identical snare drum spike would appear, in a well-implemented control room, to come from behind us.
So, A-B and A,-B have some interesting characteristics. If A and B are identical, then A,B (the playback of A and B simultaneously through the left and right speakers respectively, without summing) will be the same as A+B, or mono. In this case, however, summed A-B is zero, and it is canceled out (once again, this is the infamous so-called “180° phase shift” that isn’t really phase shift at all - my friend Neil Muncy, studio designer and audio guru
extraordinaire calls it “OOPS,” an acronym for “Out Of Polarity, Stupid!”).
In addition to this rather dull version of A-B (unless you are really into total silence), there are some other more interesting possibilities, including A,-B. If we split a single audio signal into A and B by sending it into two different modules on the console and changing either A or B slightly, the resulting A+B and A-B signals now begin to be a little more interesting, and problematical, depending on the
type of difference we’ve introduced. In thinking about this, we’ve got to keep two things in mind: first, what is physically different and second, how the sound illusion changes.
The simplest case is amplitude difference: suppose we make B 3 dB softer than A. A+B won’t be quite as loud (I’ve ignored the attenuation effect of the pan-pot here) and A-B won’t be completely canceled, because A and B are no longer identical. Although you will still hear the sound, it will be considerably softer. A,B will sound virtually the same as A+B and will still come from the center, but with a slight tendency to wander left. Meanwhile A,-B will also tend to sort of wander to the left behind you.
The next simplest case is time difference: if we delay B in time from A (keeping equal amplitude), then A will sound before B and in A,B the precedence (or Haas) effect will occur, causing the phantom image to migrate, most likely, to the vicinity of the earlier speaker (exactly where it migrates to depends on the amount of delay, the room, etc.). A+B will be
comb filtered, with the fundamental frequency of the comb filter determined by the amount of delay, and A-B will be somewhat softer and equally comb-filtered. The comb-filtering, of course, changes the timbre of the sound
significantly. A,-B will clearly migrate toward the left behind you.
The most complicated case occurs when we equalize B differently than A. Equalization causes changes in both
amplitude and time at
some frequencies of the sound but
not others. Because the effect is frequency-dependent, the auditory effect is complex. It is the basis for a variety of pseudo-stereo effects. A+B and A-B signals both exhibit comb-filtering, but the severity of the effect will be limited by the amount of equalization.
The upshot of all of this is that it is the
time-based differences between two otherwise identical A and B signals that cause (a) most of the stereo goodnesses and (b) most of the mono badnesses. This is a fundamental feature of stereophony.
However, when the differences in time become greater than about 50 milliseconds, then the A and the B versions become
separate sounds, perceptually speaking. So, there is a window of A-B difference ranges within which stereo exists: up to about 10 dB of amplitude and about 50 ms. of time difference. When we exceed these limits, in amplitude the softer speaker simply becomes inaudible; in time, what was a sense of space around the sound becomes auditory ping-pong as the later sound becomes audible as a separate artifact.
One other thing needs to be mentioned here. If, in the above case or when A and B are identical, we reverse the polarity of either A or B and then listen to A,-B, our brain is faced with a sound that has the psychoacoustic signature of “a sound that is coming from some place other than the room we are in!” This bizarre, slightly unpleasant and definitely weird sound is the prime, raw ingredient of the stereo recipe, even when A and B are identical. It is the alcohol in the scotch, if you will. When summed to mono, it disappears completely (kind of like non-alcoholic beer, if I may continue the analogy).
This A+B/A-B relationship, by the way, is the basis of Middle-Side (MS) stereo miking. The Middle microphone is A+B and the Side microphone is A-B. The Middle microphone picks up the direct sound and the Side microphone picks up everything else, i.e. the room. When summed to mono, the Side mic is canceled, leaving a pristine Middle-mic version of the signal. The goodness of this is that such a signal is definitely viable in mono. The badness is that all the room sound and sense of spaciousness disappears, which can be a serious problem.