Whadda Ya Mean, It’s Indistinguishable From Chance?
Dave Moulton
June 2000
A pair of articles exploring the difference between knowing something for sure, and dumb luck.
Whadda Ya Mean, It’s Indistinguishable From Chance?
There is a lot of confusion about the probabilistic implications of doing informal subjective tests to find out “if you can hear it?”. This pair of columns attempts to give you a little sense of what “indistinguishable from chance” really means, as well as what the “95% confidence level” means. Hopefully, these may make it easier for you, whenever you are tempted to find out on your own, what it is you really can hear.
Alert readers may recall . . .
We’ve been discussing issues pertaining to high-resolution audio, and the related questions of how we measure the perception of such audio. One central feature to such measurement is the so-called “blind” test. We do this to reassure ourselves that (a) we are not imagining things, and (b) that our perceptions are reasonably free of non-auditory influences or biases.
In blind testing, us listeners listen to various audio examples with small differences, whose identities are concealed from us. In the most basic subjective testing format, we simply try to distinguish between the examples as a function of their difference.
In this format, the test design usually calls for us to listen to Version A, Version B and Version X of an audio example. A and B will be different in some respect (say, a 6 dB difference in amplitude), and X will be either A or B. It is our task to guess whether X is A or B. This is called, in its various guises, an ABX test.
Now, the theory is, if we can always successfully identify X as either A or B, then we must be able to “hear” that A and B are different, and to identify that difference when it is present in an unknown signal. So, to score these tests in a traditional statistical way, we simply compare the number of correct identifications to the total number of trials. If all of the identifications are correct (100%), we assume that the resolution difference is audible. If they aren’t all correct, we’ve got to scratch our heads a little, and work through what it means. This leads us into a dismal swamp called probabilities. This is what I want to talk about in this month’s column and next month as well.
If us test listeners in the ABX test were to just guess the identity of X without listening to the signals (by rolling dice, for instance), in each trial we would have a 50% chance of being right.
This means that, if I listen to A, then B, then X, and guess that X = A and I’m right, then it may be because I heard a difference or it may be that I was simply lucky. We have no way of telling, from that single example, which happened. The probability that we are right by chance is 50%. The probability that we are right by perception is unknown. This means that our correct answer is something called “indistinguishable from chance.”
So even if the difference is perfectly obvious to us, we can’t prove that we truly perceived it from a single trial. Our ability to correctly identify it cannot be distinguished from the probability (50%) that we only got it right through dumb luck. This remains true regardless of the differences between the signals. It is as true for signals that are 20 dB different as it is for signals that are .1 dB different. We may find the former “easy to hear” and the latter “hard to hear,” but, with a single trial, our ability to correctly identify X remains, sadly enough, indistinguishable from chance.
The solution to this appears obvious – we do a bunch of trials. As the number of trials increases, the probability of a 100% success rate in identification br chance falls from 50% to some comparatively small percentage. If we have two trials, the probability of 100% correct answers by chance is 25% (one in four). With three, it is 12.5% (one in eight). So, we tend to casually run a bunch of trials, and if we get’ em all or mostly right we say, “See? It isn’t chance. I can nail these suckers anytime, anywhere!”
If we do 10,000 trials, and just flip a coin to determine our choice, without listening, we should be right 50% of the time, and it turns out just about that way. In fact, we can reasonably predict that for 10,00 pure guesses where we have a 50% chance of being right, we are going to find that our real score is going to be between 49% and 51%.
Meanwhile, if we can really hear that 6 dB difference, in 10,000 trials we should be able to pick it out at least 99% of the time (you’d think!). Not much problem separating “audible” from “indistinguishable from chance.”
However, it’s not really that easy. Because we are human, we make mistakes. Therefore, we grudgingly accept the idea that we can’t expect ourselves to correctly identify X in EVERY example. We tolerate the occasional wrong answer as “brain fade,” and don’t worry about it. However, as differences in test material get smaller, we make more “mistakes.” When we get down to a 1 dB difference, for instance, I have found that expert listeners will get the answer right about 70% of the time. So, between the twin evils of dumb luck and stupid mistakes, we will experience considerable variance in test scores. In small sets of trials, the result is that there is often considerable overlap between “definitely audible” and “indistinguishable from chance.”
In a small number of trials, there is a large possibility that a string of lucky guesses could lead to a score of 70% or more. When this happens, our results, for a 1 dB difference, are “indistinguishable from chance.” Drat!
As I told you, probabilities live in a dismal swamp indeed. In order to get to a point where we can say that the possibility of our score just being due to dumb luck is really very small, we have to use the same probabilities that are making us crazy. We have to answer the question: just how probable are improbable events, anyway? Sheesh!
Next month I’ll tell you a little bit about what you can expect when you conduct a small informal blind listening test. Thanks for listening.