Skip to main content

Streaming media - especially audio - has evolved into one of the best features of the broadband Internet. Technologies such as RealNetwork's RealAudio, Apple's Quicktime and Microsoft's Windows Media now allow virtually anyone to operate a global media outlet for less than the cost of a short vacation in New York City.

Although video over the Internet has improved dramatically over the years, audio remains the sweet spot in streaming media. Listening to exotic radio stations around the world is a lot of fun, but streamed voice-based programming is one of the few ways the little guy can break through the corporate gatekeepers of media.

Yet, just because the delivery quality of 'Net audio has dramatically improved, it does not necessarily mean the voices coming from your personal computer are sonically pure. Too many Web producers haphazardly encode any sound sent their way without consideration to optimizing it for Internet delivery.

RealNetworks, a pioneer in streaming media, offers an excellent free online tutorial on how producers can maximize the quality of streaming audio and video. It's worth a review. For this column, we'll look at some ways to improve spoken word programming before it is encoded for streaming.

AUDIO 101

Obviously, all the basic audio rules apply. Good microphone technique is critically important, as is correctly setting levels and using high-quality recording equipment. What's often overlooked in producing streaming audio, however, is pre-processing.

The recommended pre-processing of voice files for streaming media falls under the two main categories of gain control, which include noise gating, compression and limiting, and equalization. All these functions can be accomplished through audio editing applications and plug-ins on the computer desktop.

In my experience, however, some software-based pre-processing tools can be quite demanding. Basic applications often can play only short loops of your program through the computer for evaluation and tweaking. Because sound applications can be demanding on a computer's processor and are best used with a second hard drive, they can literally take over and prevent you from doing other routine jobs with your PC.

It seemed to me-at least for this multi-staged task-a mix of hardware components and software might be in order. Thus, my attention turned to voice processors-standalone devices that usually combine four or more functions in a single box. Though not specifically designed for pre-processing of streaming audio files, many voice processors have the right combination of features and cost to fit the job.

We worked with a Symetrix Model 528E, an analog voice processor ($749 list; about $600 street price), for the pre-processing of RealAudio files. It should be noted that the same pre-processing techniques apply regardless of whether you stream using Real, Apple or Microsoft technology.

The 528E combines a mic pre-amp, de-esser, compressor/limiter, downward expander and parametric EQ. We provided the 528E a signal from a Minidisc player (bypassing the mic preamp) and fed the unbalanced output to the audio input jack on an Apple PowerBook computer. The balanced output of the 528E fed a headphone amp for critical preview monitoring of the voice processor's setup.

The Symetrix box did its job very well and gave us a level of convenience and control beyond most of the software-based pre-processing solutions we'd previously tried. Symetrix's superb installation manual guided us through the device and gave us the starting points for each function. We simply tweaked the controls until each segment sounded best and then recorded the audio to the computer's hard drive. It was that easy.

The first stop on the 528E is the de-esser. It selectively removes the high frequencies from the input signal when sibilant sounds are present and they exceed the threshold level. It's not an essential pre-processing function for streaming media, but can be very useful when you need it.

BYE TO BACKGROUND

Next is downward expansion, a function that can reduce background noise that becomes audible during pauses in the audio. We simply turned the threshold control so that the gating or expansion occurs when there is no desired audio, but not so high that the beginning of words we want to hear are chopped off.

As for compression, we optimized for RealAudio's medium bitrate codecs, still the most common configuration on home computers. A ratio of between 2:1 and 4:1 is generally about right for speech. For high bitrate codecs (DSL or cable modem), use very mild compression, if any.

EQ settings also depend on the codec. Since RealAudio's low bitrate codecs discard much of the high end, it's important to boost the midrange frequencies in the 2.5 KHz area to make speech more intelligible. For medium bitrate codecs, we found a modest boost in the mid frequencies worked best. For high bitrate codecs, properly mastered programs should need no additional EQ. There are no set rules, however. It's best done by ear.

The only remaining pre-processing we did before encoding was to "normalize" the sound file. This is a function in which the computer's processor determines how much it can increase the loudness of the file without causing distortion. We did this on the Powerbook, since the voice processor has no normalize capability. Normalization (to 95 percent in the case of RealAudio) is the last step of the process prior to encoding.

Pre-processing can improve the sound quality of streaming audio. It's worth taking the time to do it right. A voice processor, like the excellent Symetrix 528E, is not essential with today's computer applications, but we found it a very convenient device that deserves serious consideration in the Web producer's studio.

To read RealNetwork's Production Guide, visit here.

Symetrix's Web site is at http://www.symetrixaudio.com

Frank Beacham is an independent writer based in New York.