Moulton Laboratories
the art and science of sound
Subjective Testing in Your Own Home Studio
Originally published in Recording, approx. August 2001
by Dave Moulton
August 2001
1. Subjective testing

So You Wanna Find Out If You Can Hear 24/96 Audio

1 2 >
The View From 2009: I’d forgotten about this article. It’s really quite good, and may be of great usefulness to any of you who are truly curious about how things sound. Enjoy.

Subjective Testing in Your Own Home Studio

There’s a lot of fluff and guff going around these days about the “audibility” of high-rez formats. At the latest AES convention, for instance, there were a BUNCH of sessions dealing with the question. Meanwhile, manufacturers are rolling out high-rez gear at an almost alarming rate, and the message (or is it myth?) on the street is simple: if you don’t have 24-bit 96 kHz. audio, you simply can’t make a professional quality recording anymore. 16-bit/44.1 kHz. CD format audio sucks, plain and simple!

At the same time, a serious pair of questions continue to loom in front of us. Can you (or any of us) ACTUALLY HEAR the difference between these formats? If so, how big a difference is it? Rumors abound.

What I’d like to do in this article is to give you a couple of low-cost procedures for subjectively evaluating, for your own fun and your own profit in the privacy of your own studio, to what extent you can hear this stuff, or any other audio stuff you’re interested in. In fact, I’m going to help you establish your own personal subjective testing and measurement protocol. Hot damn!

As background, a couple of years back (August, 1998, I think) I wrote an article about “audibility.” In that article I discussed many of the issues surrounding that pesky topic. I won’t go there again this time, at least not directly, so you may wanna dig it out and refresh your memory. Similarly, last November I wrote an article about the blind listening evaluation of studio monitors, which you may also wish to look up. Both are relevant to this article.

What You Can Get Out Of These Tests

Subjective tests are reasonably controlled observations that you make of the equipment you wish to evaluate. The use of controls helps you reduce the amount of confusion, bias and general madness that seems to permeate most subjective evaluations and listening tests. Happily, it’s fairly easy and cheap to do the tests (if you don’t mind spending some time), and they should prove to be extremely useful to you as you continue to work on (a) your recordings, (b) your equipment wish list and (c) your budget. Trust me!

The Physical Test Protocol

Suppose there’s a 24-bit HD recorder you think might make a big improvement in your sound compared to your current 16-bit DAT recorder. You might even like to buy it, if in fact it would make such an improvement. How to find out? Well, first you borrow one.

Once you’ve got it in your studio, you’ve got to patch it up as a Device Under Test (DUT) that you can compare to a Reference system (Ref). Here’s the generalized signal flow:
  
Figure 1. Signal Flow for comparing a DUT to a Reference System.

Using this setup, you are going to make recordings in parallel on both recorders, using your (best) microphone, and hopefully some reasonably challenging material, including percussion sounds, acoustic guitar, trombone, female voice, etc. Because you are comparing a 24-bit HD recorder with a 16-bit recorder, you probably shouldn’t use a 16-bit CD as a test source, no matter how convenient it is. It’s also important to get the levels of the two recordings as close to identical as possible (say, .25 dB).

Note that “you” are part of the test system, and the test system also includes your best mic, preamp, and monitors as well as your playback room.

Once you have this all patched together (more about the Selector Switch in a moment), you will run a series of blind comparison tests. Below, I’ll describe three different ways you can do this. What is important in the setup is that the tests be blind, and that the tests be consistent.

The primary technical resource that you will need is:

Your Trusty Assistant (YTA)

A trusty assistant (YTA) is essential. He or she will lay out the test sequences, will train you for the listening task, will administer the blind test sequences (will in fact function as the Selector Switch), will record the data, and will help you score it. Equally important, YTA will put up with all of your kvetching and complaining while enabling you to remain “objective” in the face of severe temptation. A friend indeed!

Test Details

Before you start, YTA will need to make up a test sequence. Our first test will be a so-called ABX test, where YTA will play you A, B and then X for each trial. You will guess whether X is A or B. So, have YTA make up a test sequence of 25 trials (the number is important – don’t cheat here). Flipping a coin, YTA will determine for each trial what A is (DUT or Ref). Then, with 25 more coin flips, YTA will determine the identity of X (as either A or B). YTA of course writes this all down. Needless to say, you are out of the room, lounging by the pool or something else really productive, while YTA does all this work. Did I mention that YTA is a true friend, and that you will owe him/her big-time when this is over?

Anyway, once the paperwork is ready, you’ll come back in the room, sit down in your favorite listening position, and let YTA informally play you some examples of both Ref and DUT. Listen to each until you are pretty clear on how the test set-up works. Always leave a couple of seconds between listening to Ref and DUT (interestingly, instantaneous switching can skew the results). Then, put on a blindfold (I told you the tests were blind, didn’t I?) and let YTA run the test. YTA will start both recordings at the same time, so they will be roughly in sync (they should definitely be within a quarter of a second). Switching can be as simple as using a patch cord to patch into the output jack of either piece of gear, or it can be done with muting buttons on a console, or even a silly little homemade double-pole single-throw switch. Whatever.

YTA announces to you “Example 1, A” and selects A for Example 1. After 10 seconds or so, YTA disconnects A, waits for a second or two, says “B,” and then selects the other piece of gear. After letting it play for 10 seconds or so (YTA should actually be looking at the second hand on a watch in order to stay honest about this – the tendency to rush examples gets overwhelming after a while), YTA disconnects B, announces “X” and connects the monitor to whichever piece is X for Example 1. After letting it run for 10 seconds, YTA disconnects it and you guess “X was A” or whatever you think. Have YTA avoid asking stupid things like “Is that your final answer?” (it’s cute once, but it really gets old in a hurry). YTA MUST WRITE YOUR ANSWER DOWN. He or she shouldn’t write “correct” or “incorrect” and he or she shouldn’t give ANY indication of whether you were right or wrong. No twitches, sighs, giggles, shrugs or anything. Deadpan does it!

Differentiation Tests

What we’ve just been describing is a differentiation test, which seeks to determine the extent to which we can distinguish between the DUT and the Ref. This speaks to the question of “audibility.”

By itself, “audibility” is an important issue. However, audibility by itself isn’t enough to satisfy our needs. There are several other tests we should run as well.

Preference Tests

A Preference Test is similar to a Differentiation Test, except that it asks which of A and B we prefer. In this test, YTA does 25 coin flips to assign Ref and DUT to A and B for 25 trials, plays you (blindfolded) A and then B and asks you which of A and B you preferred in each case. In this test, it is better if the program played back by both A and B is identical. To do this, you have to write index points on both the Ref and DUT recordings quite closely together (.1 sec.?), so that you can cue them each to the same start points (so you can’t tell the difference by the edit points). This will take some production finesse.

Quality Tests

A Quality Test is similar to a Preference Test except that it seeks to assign a given level of quality to each DUT. In this test, you rate the Ref and the DUT on a scale (typically a 5-point scale) ranging from something like Awful Reproduction = 1, through Poor = 2, Fair = 3, and Good = 4, to Superb Reproduction = 5. The reason for doing this test is that you can then begin to compare a range of DUTs, and can even apply your ratings over time, so that you can even have somewhat meaningful comparative scores for DUTs that were never directly compared with each other. When you do this test, you score both A and B using this rating scale. You don’t try to tell them apart or figure out which you prefer. You just rate them. Again, it is good to use the same program segment for both parts of each trial.

Another variant of this is to make the live sound that you are recording the Ref and then to score A and B on a scale of “impairment,” where 5 = No audible Difference from Ref, 4 = Audible But Not Annoying Difference, 3 = Slightly Annoying Difference, 2 = Distinctly Annoying Difference, and 1 = Extremely Annoying Difference. Incidentally, this is the protocol we’ve been using, using CDs as the Ref, to evaluate CODECs like MP3 and Dolby Digital. Depending on the bit-rate at which MP3 is running, the nature of the program material and the expertise of the listeners, MP3 will score somewhere between 2 and 4.
1 2 >

Post a Comment



rss2

rss atom