Skip to main content

Audio editing

Audio has always been the poor relation to video in television production, but to quote David Scally, past AES president, “Video without audio is just surveillance.” Like much of television production, the move away from tape to file-based workflows opens the way to more flexible and collaborative working. Editing audio for television comprises several processes, including ingest and track laying, assembly and editing, and finally the mix. At this point, the audio can be combined with the finished video edit in a process called layback, which is a reference to videotape that is becoming an anachronism, as the process is now more the wrapping of files.

Audio workflows have partly been defined by the constraints of hardware platforms. From the shoot — where audio is captured on a field recorder synced to the camera — through much of the workflow, audio follows a different path. Video shot selection and rough cut, the offline stage, uses a quick mix of location sound, but the two are not finally brought together until finishing.

Now that video and audio content are handled as files, the separate workflows are starting to meld. However, the special needs of each medium still remain, and those needs dictate separation of processes. Just as color grading needs a suite with high-quality calibrated picture monitors and controlled ambient lighting, audio mixing demands monitor-grade loudspeakers and an acoustically treated room. Video and audio editing can be done on a laptop, but any task that requires critical viewing or listening is still going to need a dedicated room in the post house along with operators with a special skill for that discipline.

The classic workflow for television has been offline, online and then finishing. Once “picture lock” is reached, the cut can be handed off to sound dubbing as an EDL or AAF. The audio can be conformed to the video and mixed down to the deliverable, surround or stereo as the contract dictates. The finished sound tracks can then be handed back to the video editor for final assembly in the layback.

The soundtrack comprises many elements, including dialogue, scene ambience, effects, ADR and music, which may or may not be present depending on the genre. The minimum is to be found in news, where the reporter’s voice-over is mixed over the location sound. At the other end is the soundtrack for television drama, which can be treated like a movie production with separate editors for dialogue, effects and music, plus the sound mixer (also called re-record or dubbing mixer). Much like offline and online video editing, the skills are different — the former all about assembly and then setting the pace and telling the story. The corollary for online and finishing is the mix — setting the feel for the soundtrack.

The divisions in workflow are blurring. The ability to exchange files without the need to lay off the tape means that there is back and forth between video and audio processes as producers try to compress time scales by working in parallel in what was once a serial workflow. Many post houses use network storage shared by audio and video. Files can be used by either department without the need to copy files to local workstation storage. In rapid-turnaround production, this can gain precious time. When changes are made after picture lock, the editor/mixer must edit the timeline to reflect the video changes. This can be a manual operation, but software tools come to the rescue for more complex revisions to the EDL.

The last stage in program post production used to be the layback of audio onto the edited video. But now that so many programs have surround audio, it makes more sense for the client to see the final assembly in the audio suite, complete with full surround monitoring, rather than in the edit bay, which is often much smaller. Modern popular sound editing software now includes a video engine, so all the content files can be handled in the audio room.

Hybrid control surfaces

Audio used to have two distinct processes, editing and mixing, but those too are merging. The automated console has been around for decades, but now the digital audio workstation (DAW), the editor, can be linked intimately with a mixer control surface. Editing and mixing can happily coexist if the workflow demands it.

One popular configuration is the hybrid. Modern DAWs have merged with the mixer control surface so that editing and mixing can be seamlessly combined or left separate if desired. The control surface has its own DSP, so it can mix independently of the DAW. This allows, for example, several DAW sessions to be mixed (dialogue, effects, Foley, etc.) with live sources, which could be music, ADR (automated dialogue replacement) or narration. This removes much of the constraints of older workflows; little problems with the edit can be quickly fixed during the mix.

This hybrid operation is aided by industry-accepted standards such as the EUCON control protocol. The DAW can run on anything from a laptop upwards, using the host CPU with the option of DSP acceleration on external cards.

The DAW has also freed the mixer from bouncing down tracks, once an essential process when the mixer ran out of tape tracks on a multitrack recorder. Of course, it may well be convenient to create a submix or stem of some tracks. As an example, if the dialogue tracks are combined as a stem, then it can be easily removed for a music and effects (M&E) deliverable.


No audio mix down would be complete without the application of the sound designer’s favorite effects. Although software plug-ins have largely replaced the outboard effects racks of analog consoles, the concept of adding that special sauce beyond the EQ and dynamics offered by the console manufacturer remains very much part of sound design.

Audio plug-ins use a number of proprietary interfaces that have become ad-hoc standards. These include Virtual Studio Technology (VST) from Steinberg and the new Avid Audio eXtension (AAX), which replaces the earlier Avid formats of Real-Time AudioSuite (RTAS), AudioSuite and TDM.

Apple may have turned away from it, but skeuomorphism is still very much alive, and even expected, for user interfaces in the world of audio plug-ins. The “tube” look seems to be favored for interface design, drawing heavily on hardware from the 1950s and earlier.


Multiplatform delivery has brought its own set of issues for the mixer. The contract may call for 7.1 or 5.1 delivery, but also require a stereo mix. Then there are the issues around metadata, dialnorm and dynamics. The sound mixer may be creating a surround mix deliverable, but it must also sound good for the stereo listener. The sound suite will be equipped with STB emulation so that the stereo fold-down can be monitored for incompatibility issues. For example, dialogue on the center channel should not be lost in the stereo down mix.

For delivery in the Dolby Digital format to the listener, there are a whole host of metadata parameters to be configured, summarized as down mix, dialnorm and dynamics. Program commissioners should specify a set of metadata values for the deliverable. These parameters include dialnorm, stereo down mix and center down mix.

In the past, each final mix — surround/stereo, full/M&E — was laid back in a real-time pass to an HDCAM SR deck. With file delivery in an MXF format, each mix or stem can be played out in faster than real time with a background render. Even with tape, several versions can be prepared then laid back in a single pass along with the video, saving time and head hours.

Long live the fader

Current laptops and workstations, plus 64-bit operating systems, provide powerful platforms for the latest DAWs. Married to control surfaces, the software and hardware deliver the means to edit and mix audio in more flexible workflows, whether editing a simple documentary or a prime time drama. The control surface remains at the heart of the user interface, with little sign of mouse and touchscreen replacing fader and knobs.

Today’s audio production platforms provide post houses with the tools to support fast-turnaround file-based workflows in face of the increasing demands of formats like surround sound, and the need to control costs.

David Austerberry is editor of Broadcast Engineering World edition. He would like to acknowledge John Johnson of Scrub, a division of HHB, for assistance in preparing this article.