Perceptual Expectations, Immersive Sound Environments, and Recording

Ron Pellegrino, December 1997

Perceptual Expectations

At the early stages of any new enterprise involving perception (such as designing immersive sound environments), out of fundamental ignorance, we tend to be too naive and weighed down with wishful thinking about the requirements for creating a convincing product. One of the sources of that naiveté is engineers approaching the problem of perception as a stationary target when in fact it´s a moving and evolving target. The more experience a culture has in seeing and hearing in the real and virtual worlds, the higher its expectation levels for the following iterations of seeing and hearing. Furthermore, engineers tend not to factor in the reality that perceptual quotients like intelligence quotients are scattered all over the plot? Just who are we trying to convince...or fool? Who is the target audience for immersive sound environments?

Lately I´ve been studying the sound and imagery in major Hollywood films of the 1960s and earlier. Check them out and notice how often exotic and not-so-exotic locations were actually films on a screen at the back of a sound stage; conceptually that technique is right out of vaudeville. In the 1960s I and everyone else accepted that as a standard technique; in the 1990s it´s a knee-slapper. Today pre-1970s audio "effects" are also good for laughs. The fact is that the conceptual and physical tools of the pre-1970s audio world were nascent at best. Pre-1970s acoustics was based on the work of Helmholtz (1821-1894); psychoacoustics was still in its infancy. Not that much gear was available for sound effects either; pre-1970s reverberation units consisted of collections of springs, a metal plate, or a speaker in a room. In the late 1990s those of us in the sound field find ourselves the beneficiaries of three decades of breakneck research, development, and production integrating psychophysics and affordable technology. Though we are far from being acoustically omniscient, every day we are less and less acoustically naive.

My point is that the more naive one is about perception (as perceiver and presenter) the less one expects in the way of convincing perceptual information and cues. As we learn more about the requirements for convincing experiences (mainly by actually having and being conscious about convincing experiences) then questions arise as to how little information is necessary to convince the lower 95%, how much is required to convince the upper 5%, and what´ll work for the 45-95 percentile?

Immersive Sound Environments

The question boils down to how much auditory information is good enough for a convincing virtual immersive sound environment. If the sound environment is imaginary, anything goes as long as it´s appropriate for the target subculture. If a virtual sound environment is simulating an actual sound environment for the purpose of teaching people how to operate in that actual environment, it should be very accurate or it runs the risk of misinforming the students and possibly putting their lives in jeopardy.

What we need are good convincing models for immersive sound environments. We need realistic physical models for learning and play; and we need inspirational imaginary models for aesthetic stimulation. Naive people, people with limited auditory consciousness levels, are apt to accept even the weakest of virtual immersive sound environments. Without experiential references they will have no idea of what they´re missing. On the other hand, people with wider auditory experiences and a fuller appreciation of the depth and range of auditory possibilities will have far greater expectations thus will be more difficult to satisfy. The hitch here is that critical, analytical, and creative listening is not part of a general education so there is a low percentage of people with high auditory expectations. In fact, given the din and rumble of the modern world, turning a deaf ear is a common defense mechanism. Music studies may be the only field where people are encouraged to listen, and even in that field encouragement is generally tepid.

However, our culture does set apart concerts and recitals as opportunities for creative listening. We can expect the best of those opportunities to provide excellent physical models for immersive sound environments. For example, Catherine Malfitano, a multifaceted soprano equally adept at singing the raunch of Kurt Weill, the sheen of Samuel Barber, and the melodrama of Puccini, gave a recital late in 1997 in San Francisco at Herbst Theatre, a near perfect model of a recital hall. As I was sitting there in a great seat (5th row center) listening to this world-class singer, one of my thought streams was channeled by recent discussion threads from the World Forum for Acoustic Ecology on "bottled sound" and ICAD (International Community for Auditory Display) on " immersive sound." Malfitano was this powerful dramatic musical presence making eye thus direct voice-beam-to-ears contact with people all over the theatre; she sprayed that focused voice-beam from front to back and side to side as she moved around the stage. There was no way any recording setup could capture or replicate the pitch, volume, and tone color effects that resulted from that spraying.


Malfitano´s recital was truly a multimedia experience; it was performance art. Facial and body expressions were writ large. She had the shaman´s touch for transporting the audience to a higher spiritual realm. Her sound only made sense in the context of that total experience. The sound (most noticeably loudness and tone color but pitch inflections as well) of the voice heard by the audience changed as the performer moved in the space according to the dramatic requirements of the text she was singing. The loudness and tone color changes made complete sense in the context of her performance but would sound very strange and "wrong" in any recording of that event. In fact, any attempt to make a recording would also have drastically changed the nature of that experience. In no way could that experience be "bottled" even with multiple mics and video cameras. The sound was so much a part of the total experience that removing it from its context would be a cause for sensory confusion and would result in a pale, diminished sense of the experience.

Any pickup device (mic or camera) is just a window on the experience it´s monitoring. If the location of the window is fixed (as mics and cameras often are) performers usually feel compelled to fix themselves in front of the window; relating to a fixed pickup tends to disarm all but the most schooled performers.

If the window (a mic) is fixed on a sound source (clip-on or head pieces) it naturally distorts the sound image relative to the source and relative to the listener when the source moves in the space. On the other hand, if the window (mic or camera) is adjustable and moveable, it can become a musical instrument in its own right and lead to the creation of composed flowing points of view; some composers and performance artists understand this approach while few audio engineers can even begin to understand the notion.

Most recording grows out of an acquisitive and materialistic urge, the desire to grasp, capture, and possess experiences to serve as trophies or stock in trade. The life of the sonic experience is traded away for one or another form of currency. It all amounts to a good example of "be there or be square." Being there gives one access to the full web of experiential dimensions which includes sound. Being square leaves one with a window of a fixed and distorted perspective - a recording with a dead point of view.

