Jargon
Interfaces
Analog
Your standard basic audio signal: a changing voltage over time, with magnitudes between .1 Volts and 1.5 volts RMS, in part depending on whether the signal is specified as +4 dBM or -10 dBu. An analog audio signal is an analog audio signal, with the only variations being in nominal level. It can be transmitted by balanced or unbalanced lines, with any kind of connector you happen to have lying around.
S/PDIF
S/PDIF (Sony/Philips Digital Interface Format) is the “consumer” digital interface, with both hardware
and software specifications. It requires that the digital audio be transmitted serially, alternating between left and right channels, with each sample fitting into a 32 bit “subframe” (which is like a big word) that includes synchronization information, the audio itself, some housekeeping and identifying data, and a spare bit that is used in conjunction with similar spare bits from other subframes to carry lots of information about the data, including whether the format is consumer or pro, stereo or mono, what kind of error correction is being used, etc.
AES/EBU
The “professional” digital interface, also with both hardware
and software specifications. AES/EBU (actually there are two different standards but they are functionally identical for our purposes here) requires balanced lines, a higher voltage level than S/PDIF , and professional cables (XLR or coaxial).
By 2005, AES/EBU and S/PDIF are generally compatible, and we routinely convert one to the other, by use of matching transformers.
Optical
Optical transmission refers to the transmission of digital data as pulses of light through fiber-optic cables. It is not a language or a format. At the present time, Alesis uses optical transmission for their proprietary format for the ADAT system. No doubt another standard is on its way, as optical transmission has a bunch of advantages. These include absolute freedom from grounding hassles, incredible data density (would you believe they are working on a cables transmitting 1 trillion bits/sec?! -- that’s enough for two and a half continuous
days of 6-channel Surround Sound in 16-bit digital format transmitted per second!), low error rates and no signal loss over the first 20 miles or so of fiber-optic cable.
Proprietary
A number of manufacturers have their own private digital languages, including Tascam, Yamaha, and Alesis. In order for digital data stored in such formats to be transmitted to other digital devices, translators (or interface boxes) need to be used that convert the data from one format to the other. Such competing proprietary formats are a nuisance and bottleneck for us users, but often they permit manufacturers to do things they couldn’t otherwise do, so such formats are in general a necessary inconvenience.
Bits
What is a bit
A bit is a single piece of binary data. It can be a one or a zero. It is the fundamental element of digital data.
What is a byte
A byte is an 8-bit word. Personal computing jargon refers to data sizes in kilobytes and megabytes. So, when I noted that it took 768,000 bits of data to convey 1 second of 16-bit digital audio, that is the same as saying that it requires 96 kilobytes of memory (on a floppy disc, for instance). A Macintosh with 5 megabytes of RAM will store (in theory) up to 52 seconds of digital audio in its RAM memory (that is, 5 million times 8 divided by 768,000).
What is a word
A digital word is a block of binary data defined by whatever language is being used. Words are usually 8, 16, 24 or 32 bits long. The word isn’t all pure data. It contains it’s own address, error detection/correction info, and other stuff in addition to the data.
Linear-bit PCM
Linear Pulse Code Modulation digital data is the format that we have generally adopted for our digital use. It is fairly intuitive, and it presents no big computational problems. It is literally the pulse code derived from the quantized mapping we described in this article. There are, interestingly, other schemes for gathering digital information.
Delta Modulation and Single-bit Pulse Code Modulation
Alternative schemes for handling digital data. “Delta” refers to change, and the system is a simple one-bit system where each bit indicates whether the wave got bigger or smaller during the sample period. DBX marketed a Delta Modulation recorder for a while, and some early digital products (most notably from a firm named DeltaLabs) used this technique. For delta modulation to work, the sampling speed has to be really fast (like 500 KHz.) but because it is a one-bit system, memory needs are no worse than in pulse code modulation systems. Error correction is not nearly as much of a problem, and the anti-alias and anti-imaging filters are comparatively simple to implement because the sampling rate is so high. The problem is that with true delta modulation, the recording length is the word length, so that computation (i.e. signal processing) is a bitch, or “impractical”, as nerdophiles would say!
There are related schemes, including Sigma-Delta modulation and Delta Pulse-Code Modulation. These all involve manipulating the single bit to make it a little more audio-friendly and/or like pulse code.
By 2005, DSD and SACD formats from Sony are in general use, using a single-bit system and a clock rate of approximately 2.8 megabits/second.
Extra bits
When digital signal processing involves complex calculations, errors (that’s distortion to you, Joey) creep in due to both the acts of rounding and to limited iterations in circular calculations. Also, digital summing of many channels yields a digital equivalent to the noise build-up that occurs in analog mixing. The result is that the effective and
accurate number of bits becomes reduced. To compensate for this, many recording schemes add extra bits, so that we have 18 or 20-bit systems whose goal is to maintain
at least 16-bit resolution after complex signal processing has occurred.
Dither
Ironically, digital distorts more at low amplitudes rather than at high ones. The solution, interestingly, is
the addition of noise. Called dither, random noise is added to the audio signal so as to randomize the value of the least significant bit creating a gentle, smooth digital hiss that our personal psychoacoustic filters are willing to ignore most of the time. Without dither, a solo piano in symphony hall decays into a brittle, angular unnatural ‘zipper’(?) sound that tears at our psychoacoustic soul (well, mine, anyway).
Sampling Rates
44.1 KHz
The standard CD rate, derived from video practice for compatibility reasons.
48 KHz
The professional standard rate.
32 KHz
The digital broadcast audio sampling rate.
Conversion of rates
Fairly elaborate algorithms are used to convert from one rate to another. If you simply play back 48 KHz. at 44.1 KHz. you’ll get a corresponding pitch shift.
Oversampling
A math trick to help with conversion. By sampling at a much higher rate, filter and DAC and ADC designs become simpler and cheaper. The oversampled data is averaged before digital storage, and/or filtered out as part of reconstruction. It works. You don’t need to worry about it.
Errors
Ah, yes. Errors. You knew they had to be here somewhere! Well, they’re here, and digital has plenty of them, including during the so-called “perfect” recordings. In fact, it is generally held that without the really substantial error-correction processes we’ve developed, digital audio would not really be usable.
Ken Pohlmann reports in the
Handbook for Sound Engineers that compact discs, in replaying digital information, are wrong on only one out of each one million bits. That sounds like an impressively low error rate (surpassed only by the proof readers of this magazine!). But digital forgives few errors, and in fact this error rate is too high -- think of it as an error per channel per second! Errors are intolerable at two levels: First, an error in an audio bit is as noticeable as that bit is significant. That is, an error in the most significant bit or two has nothing to hide behind and the audio quality is clearly diminished (through those cracks and pops we’ve all heard when listening to the CD that Fido sank his teeth into). On a second level, digital audio information is surrounded by critical synchronization bits and other critical information which, when missing or misleading, can cause playback to be interrupted or stopped entirely.
For continuous, pleasing playback of digital audio, these errors must first be detected and then either corrected or covered over.
The Nature of digital errors versus analog errors
Digital errors are usually either faulty recovery of individual ones and zeros, or the loss of a group of contiguous numbers due to tape dropout or similar calamity. Due to the extremely dense nature of digital storage, when we get a dropout, we lose a lot. In the analog realm, these things usually result in minor changes to the continuous wave-trace, resulting in minor non-linearities that may be audible, but are usually not terribly annoying. In the digital realm, the errors turn out to be a lot worse, aesthetically speaking. If we recover a zero instead of a one, for instance, that may result in a value being 10,000 instead of 00,000. This represents what the nerds call “a most annoying non-linearity” (actually, if we get the reverse: 00,000 becoming 10,000 and it is in the data stream your bank is using to express your bank account balances at your local bank, the effect is quite pleasant, particularly if their error detection is not quite up to snuff). If we lose a group of numbers, this results in a serious gap in the data stream.
The point is that digital errors are considerably more noxious and audible than their analog cousins. Therefore, they must be dealt with more carefully.
Detecting Errors
The first thing is, we gotta know when we have an error. This is what parity bits do. In the simplest version of this, a parity bit simply represents either odd or even, and is added to the data word as a calculated function of whether the sum of the bits is odd or even. If there is faulty data recovery, then the sum of the bits may (will probably) change from odd to even, throwing it out of agreement with the parity bit. When such disagreement exists, the computer assumes that an error exists and goes on to do something about it.
The actual parity bit procedures and other error detection methods are considerably more complex, sophisticated and robust than I have described. The principle is to include enough information
about the data to be able to reliably determine whether or not errors have occurred
in the data during storage or transmission.
Correcting Detected errors
Once you’ve found you’ve got an error, you’ve got to do something about it. Again, the math for this is daunting, and some of the operations used are really pretty elaborate, but what it comes down to is redundancy: you make spare copies of the most important bits of data (these are called Most Significant Bits, and are the ones and zeroes farthest to the left -- the one in 10,000, for instance). In DASH digital recording, for instance, about 50% of the audio data is stored twice, as backup. This redundant data allows you to reconstruct faulty data quickly and accurately.
Concealing uncorrectable errors
Sometimes data is lost that can’t be recovered, so that the computer is faced with the problem of knowing it has a mistake and having no way to fix it. The next line of defense is to try to make up equivalent data. This usually is some sort of interpolation, where the computer looks at the good data on either side of the bad data and inserts something reasonable. This works fine for the occasional word loss, where you are going from 10,000 to 9,000 with something lost in between, so you simply interpolate 9,500 as a reasonable guess. This works fine until you have a big data loss, in which case you get
The dreaded mute!
Think of it this way: you are up on stage performing and you simultaneously forget the words
and have the power fail. There is simply no way you are going to continue. For the computer, when the loss is so catastrophic that there is no reasonable to way to interpolate, what is it going to do? Play a sine wave? Noise? If it plays all ones or all zeroes, the speaker cones will all stick out into the room or suck back into the speaker and a bad smell will come from the ports while thermal overload lights on the amps begin to light up. The point is, there is nothing to do but shut down until order is restored. For the computer, restoring order probably takes a good portion of a second.
comments: (0)