Visual audio-signal monitoring

Today, fast perception and accurate assessment of audio, using visual representation methods, are indispensable in production, post production and broadcast. Suboptimal monitoring conditions, stress and aural fatigue are just a few of a large number of reasons why relying just on one's ears is not enough when it comes to quality control.

This is particularly true for complex 5.1 surround mixing. In the pro audio market, there is an extensive range of specialized tools for any type of audio-signal monitoring — from simple peak meters to highly sophisticated surround analyzers. Ideally, those units allow for quick and intuitive interpretation of what is being viewed and fast reaction as necessary; however, this requires an understanding of at least the most critical technical interrelations and the basics, as well as a reasonable configuration of the available instruments.

Level

In professional audio, the type of visual display required most often is the level meter used for examining signal levels; today, this component is found on every mixing console and countless peripherals and recording devices. A level meter is needed, for example, for visualizing clipping on recorders or transmission lines, or in signal processing. At the same time, level meters reveal too-low levels, which make optimum use of the dynamic range between the noise floor and the clipping threshold difficult. The signal level does not only need to be adapted to the technical conditions of the transmission network; at the latest, when exchanging programs with other studios or broadcasters over links or recording media, adhering to agreed standard levels as well having internationally different standards becomes critical. Here, too, level meters are crucial. In this context, it can be assumed that we mostly deal with digital audio today. The first stumbling block, in particular, for a professional user who might not exclusively — and not even mainly — have to handle audio is the actual recording to a tape, hard-disk or solid-state medium: All audio devices used in practice have some kind of meter — be it a pointer or bar graph instrument or a GUI element on a PC. Unfortunately, only a few of these meters meet professional demands and standards, and are therefore capable of delivering comparable and reliable results. The same is true for digital audio, although there is a clearly and uniquely defined unit: dBFS (decibels relative to full scale). In fact, the task at hand is quite simple: A standardized digital dBFS-scaled peak meter (PPM) is needed to meet professional requirements.

Now, it would be reasonable from a technical viewpoint to define the scaling of such a PPM instrument for digital audio in a way that the zero corresponds to the maximum level of 0dBFS. (For reasons of simplicity, we assume here that levels above 0dBFS do not exist.) On the other hand, there are a number of reasons for including headroom instead of using a fixed zero (i.e. 0dBFS = 0dB on the scale). In practice, this approach would make things simpler and safer.

For example, when examining the level of any commercially available pop-music CD on the digital domain, it almost constantly remains near the digital full-scale level. Modern CDs are purposefully mastered in this way to achieve the highest possible loudness and thus the highest possible attention for the program in question. Unfortunately, many producers seek to achieve a similar level near the full-scale threshold already during the recording phase. Let's consider a sample scenario: An audio engineer prepares for an interview. Right before the recording starts, he quickly checks the recording level and adjusts it with that goal in mind. It is obvious what will happen: The first loud clearing of the throat will result in considerable clipping. This is a serious problem. Unlike analog recording media, digital systems do not have a smooth transition to clipping. Even an upstream limiter will normally not prevent excessive levels that usually result in extremely unpleasant distortion. Recorded digital clipping can be fixed afterward only using disproportionate post-processing efforts — if at all. Therefore, appropriate headroom is a must, in particular, with digital systems. Moreover, this approach presents virtually no drawbacks because modern devices usually have an extensive dynamic range. Even recordings made at a level far below the allowable maximum are not at risk of getting too close to the noise floor. Using modern digital systems, raising low recording levels at the post-processing stage is child's play.

Each production facility or standardization body independently defines a level range as headroom below full-scale level. For example, the EBU recommends 9dBFS; this means that when using a digital scale with 0dBFS at the top, the headroom range would start at -9dBFS. In order to visualize that, the maximum level of the signal should be set at or around -9dBFS, specifying a color or brightness change above that level would make sense.

Many devices implement an analog graph scaled in decibels rather than in dBFS. In our example, the 0dB position would be at -9dBFS, while the maximum scale interval would be +9dB. Of course, this would still correspond to 0dBFS; after all, only the scaling has changed. This approach visualizes the desired maximum recording level even better. (See Figure 1.)

Many users from the fields of professional audio have become accustomed to the integration time of peak meters, which is typically 10ms. Therefore, sticking to that integration time when metering digital signals has become common practice. This is to ensure that the familiar viewing characteristics can be retained, although there are actually different principles applicable on the digital domain. To make sure, however, that no digital peaks are missed by the instrument, the display should also include a marker showing the level at sample precision without integration time.

Comparing the display values with and without integration time for different types of programs such as speech, music or test tones reveals partly considerable differences. For example, with speech recordings with proximity effect, the deviation can be more than 6dB. In practice, this means that peaks can actually reach levels at or above -3dBFS even if the recording level has been set with 9dBFS headroom at an integration time of 10ms. Consequently, setting headroom on such a scale is realistic rather than exaggerated.

As already mentioned, while getting an appropriate signal with no clipping is the top priority at the recording stage, there is no risk of causing irreversible clipping when subsequently processing the material at the studio. It is relatively simple and straightforward to compress the program using an appropriate dynamics processor and then to raise the overall level statically or dynamically until the loudest passages are set to a level near 0dBFS. There are a huge number of appropriate hardware and software tools on the market, many of which include functions for loudness maximization.

When producing a live broadcast, post-processing is not an option; a “ready-to-use” signal must be delivered in real time. Engineers experienced in live recording know that setting up large headroom during the rehearsals is critical. When the band or orchestra starts its performance in front of the audience, levels are normally higher by at least 3dB than during the sound check. Considering modern 24-bit digital recording and broadcast systems, there is definitely no need anymore to approach the digital full-scale level when doing live broadcasts or recording. I want to point out that even today there are many A/D and D/A converters that produce artifacts even before reaching the full-scale level; at -3dB, this is no longer a problem.

Loudness

A reliable and standardized method of examining program loudness has become a critical element of modern radio and TV production and broadcast. Information obtained using that method allows for adapting program dynamics to different target groups and for effectively preventing abrupt loudness changes between different program formats. While the technically achievable audio quality in radio and TV broadcast is better than ever before, listeners rightly complain that they need to manually compensate annoying loudness changes using their remote controls more often than ever before — for example, when the program changes from a commercial break to a movie broadcast in 5.1 surround sound. In addition, accomplishing program dynamics that please both the motorist in his car and the demanding film enthusiast in his home-cinema environment seems hardly possible. Metadata and dynamic range control (DRC) may be helpful in dealing with this problem, but they are not extensively used at the moment. What is more, the necessary configuration steps on consumer-level surround receivers are often more complicated than setting up professional units.

Regardless of the preferred solution and the type of program, loudness measurement at various points of production and distribution paths is obviously inevitable. The same applies to binding loudness-measurement standards. For some time, the ITU has been working on such standards (BS.1770/1771); however, at the time being, these are recommendations. In addition, the parameters used in loudness metering are not yet defined as uniquely as users and manufacturers would require. Thus, measurement results can be compared only if additional measurement conditions such as the reference level, time constants, thresholds and the applied weighting filter are known in addition to the actually measured value. Moreover, the fact that the terminology has been changed several times during the development of BS.1770/1771 leads to confusion. For example, the weighting filter, which was initially referred to as “RLB,” was renamed to “R2LB” when a pre-emphasis was added; later on, the original term “RLB” was restored though the pre-emphasis was not omitted. Meanwhile, the term “K” is used for that filter — which must not be confused with the K-metering designed by Bob Katz.

The loudness-measuring unit has changed, too. While loudness was initially scaled in dBLU (Loudness Units) from -21 to +9, the Advanced Television Systems Committee (ATSC) has now introduced “LKFS” (Loudness, K-Weighting, and Full Scale), an alternative unit shown in Figure 2.

As long as there is no common loudness-metering standard, the manufacturers of such tools keep the relevant parameters variable on their systems; as soon as a standard has been agreed upon, they will adjust those parameters accordingly by means of a firmware upgrade.

A loudness-measurement system covering all conceivable applications from recording and live production to post-processing to distribution and subsequent program analysis needs to include multiple different measuring tools. These tools must primarily differ in the time windows the measurements are based upon. A loudness meter for live broadcast needs to meet entirely different requirements than a solution for long-term loudness analysis of the different stations run by a broadcasting organization. With live productions, current loudness information is needed at any time, so measurements need to be made with a relatively short integration time. This results in a current “loudness image” over all channels of a mono, stereo or surround program that can be displayed, for example, as a bar graph.

Another useful piece of loudness information is delivered by a tendency indicator. By means of an integrating averaging, it allows the operator to identify the loudness tendency of a program during the last 20 or 30 seconds. This again allows for visualizing loudness uptrends or downtrends that the engineer can then compensate manually. During this process, an ongoing (dynamic) time window is critical to ensure that averaging always occurs over an identical time range. In addition, excluding modulation pauses using an adjustable threshold would also be desirable to make sure that measurement is not spoiled; otherwise, a very loud program followed by a pause would result in an entirely normal average.

In addition, long-term measurements and recording of measured loudness values may be interesting for broadcast control, QC and subsequent program analysis. Such a tool may be used for examining the loudness history of the program over several hours or days and for either graphically representing the results over time or summarizing them as a numerical loudness average. It is questionable, however, whether measurements over a long time are reasonable at all.

Surround

When dealing with surround programs in production, post production, broadcast or mastering, intuitive audiovisualization is critical. Due to the larger number of separate channels, the risk of errors would otherwise increase significantly — up to the complete failure of channels. In addition, the phase relationships of the channels, which are elementary for a high-quality surround signal, need to be constantly monitored.

Several different surround-signal visualization methods are available on the market today. These include, for example, the Jelly-Fish and StarFish by DK-Technologies and the Pinguin Surround Meter. (See Figures 3 and 4.)

The RTW Surround Sound Analyzer with the typical “house display” developed by graduate engineer Thomas Lischker visualizes phase and loudness relations between channels at a glance. (See Figure 5.) The base is a calibrated vector representation of the sound-pressure levels (SPL) of the individual channels where the end points are interlinked by lines. The area enclosed by the lines becomes a measure of the overall volume, and the distribution of the area over the four quadrants defines the sonic-image balance.

When the house display shows a square, the four L, R, LS and RS channels share the same sonic-pressure level. If the side faces are straight, without any bends, the individual channels do not correlate — for example, when a cheering audience is on the recording. Boundary lines bent outward indicate a positive correlation of the two channels in question, while lines bent inward show a negative correlation. (See Figure 6.) This way, an inverted phase on one channel can be clearly identified. One of the vectors being shorter than the others implies a missing channel or a too-low level. The yellow vector of the center channel is interlinked with the L and R front channels by separate yellow lines. This allows for quickly realizing the relation between the phantom source formed by L and R and the center channel. This means that if the yellow triangle considerably stands out from the overall image in the upper part of the SSA view, the center channel is predominant compared with the others. Depending on the program material, such predominance might or might not be desirable. The positions of phantom sources existing between the other channel pairs are easily grasped, too. The dominance vector, which shows the aural focus of the overall sound source, is marked by a white X.

The weighting algorithms used for accurate surround representation are actually not uncritical. On the one hand, the display must visualize the phase relations between the individual channels at appropriately high speed; on the other hand, integration times should be long enough to precisely display the SPL of each channel and its aural effect. For that purpose, the RTW unit employs a weighted RMS detector. Too long integration times would make the display sluggish. When the monitoring and display systems are calibrated to a specific SPL value (for example, 78dBA SPL), red markers on the SSA screen indicate for each channel when that predefined SPL is reached.

AES3 status data and interface parameters

Status data within the AES3 signal includes information on the sample rate used, the signal status (professional or consumer) and various user data. A critical aspect about status data is that the nominal data does not necessarily match the actual physical conditions. Typically, the data is generated either automatically or from relevant user settings by the sending device; that is, it is not the result, for example, of an actual sample-rate measurement performed by the analyzer. Therefore, discrepancies between the actual sample rate of a digital signal and the information provided by the status data are among the most frequent error sources on the digital domain and might even prevent successful transmission. For example, according to the supplied status data, a sending device outputs a 48kHz signal; however, the received signal has a sample rate of 44.1kHz. This inconsistency may result in the receiver not processing the incoming signal. Therefore, convenient error-checking capabilities are critical. In addition to reading the AES3 status data, an analyzer must be able to identify the actual physical characteristics of the signal.

Those characteristics include not only the sample rate and the carrier voltage but also check of the clock synchronicity of multiple data streams. In professional audio, the fact that all available digital signals are in sync regarding their phases and clocks is taken for granted; however, often enough this is not true — for example when the digital signal of a freely running unsynchronized player (DVD player, satellite receiver) is used. This results in sporadic clicks. Latencies are another issue. For instance, when transmitting a surround as well as a stereo signal from sporting events to the broadcasting center, the two signals are often separately encoded and transferred over different links. The use of a professional analyzer that also monitors the incoming digital signal allows for easily identifying latencies or asynchronicity. When troubleshooting an audio setup with numerous external sources, checking these parameters should be on top of the list.

The status data also includes information about whether the two received channels carry a single stereo signal or two separate mono signals and whether linear PCM audio or other data types (e.g. encoded surround signals in Dolby AC-3 or Dolby E formats) are transmitted. Data of those types must be run through an appropriate decoder before further processing and/or D/A conversion. Already today, several routers are capable of transparently forwarding not only linear PCM audio but also encoded streams of those types over AES3 interfaces. Routing such a data stream directly to a D/A converter without passing it through an upstream decoder would result in high-level noise, which is extremely unpleasant to the human ear. This is effectively prevented by evaluating the information about the type and contents of the transmitted information included in the status data. In this case, a properly configured D/A converter will just mute the channel. Specific surround-enabled audio-analyzer systems could be enhanced with an integrated Dolby decoder allowing for signal analysis and post-processing of the individual channels without requiring an external decoder.

ID signals

When dealing with surround signals, experience shows that channels are often swapped by mistake. This is particularly true when a signal uses multiple separate transmission channels. A number of broadcasters and organizations have developed methods for uniquely identifying channels. Black and Lane's Ident Tones for Surround is probably the best known; other variants include EBU 3304 for surround, as well as GLITS and EBU 3304 for stereo signals. In addition to the actual identification, detection of level and latency mismatch between channels by the receiver is critical for troubleshooting. Latencies can occur, in particular, with surround-signal transmission.

Correlation meter

Phase relations between the two channels of a stereo signal — and thus its mono compatibility — are still key parameters for assessing audio signals: The kitchen radio plays mono, and the same is true for a number of TV programs. Correlation meters are often used for quick and continuous monitoring of phase shifts between channels. Those units are useful for discovering reversed polarity or run-time errors, as well as for optimizing microphone placement.

High-quality correlation meters operate over a wide range independently of the level, i.e. the level does not affect the display values. A commonly accepted design of correlation meters is a horizontal or vertical bar graph where the positive range is often green and the negative range red.

The correlation scale ranges from -1 to +1; the zero point is at the center of the scale. “Correlation” refers to the degree of correspondence of two audio signals. Entirely identical signals (for example, a mono signal on both stereo channels) have a correlation of +1; completely unrelated signals have a correlation of 0. The same value is displayed when a channel fails. The correlation meter also allows for concluding the “width” of a stereo signal. A displayed value of 1 refers to a mono signal located in the stereo center; on the other hand, 0 indicates a signal reproduced on the channel sides only; no sound from the stereo center is heard. Stereo mixes normally have a correlation between 0.3 and 0.7.

A stereo signal with a negative correlation is normally considered as technically defective. When two channels of the stereo signal are identical but their phases are reversed by 180 degrees due to polarity reversal, the correlation meter shows a value of -1. A value between 0 and -1 results from a stereo mix that contains phase-modulated components as generated by effect units, delay units and electronic sound generators. Downmixing such signals to mono causes drastic sonic changes due to phase cancellation.

Stereo-image display

Stereo-image displays (which are also referred to as goniometers or audio vectorscopes) provide comprehensive information about phase relations, intensity, stereo width and directions; however, the user requires specific basic knowledge for correctly interpreting the screen display. This is because these devices are not as intuitive and easy to understand as the simple ±1 bar graph of a correlation meter.

Stereo-image displays originally were modified oscillographs featuring a monochrome display tube. Today, these have been replaced by modern high-quality multicolor flat screens — for example, TFT displays. While those devices cannot replace acoustic checks of a production, they are still very useful for supporting the user in assessing the balance of a stereo mix. Stereo-image displays show the phase relations of signals contained in the mix in real time and allow for discovering errors caused by polarity reversal or clipping. Current devices are even capable of displaying the phase relations and levels of the input signals at the same time, which is quite convenient.

Since the practically utilizable display range of those units is relatively small, a vectorscope must include an automatic gain control (AGC) circuit to keep the signal level within an appropriate range regardless of the actual input level. On the other hand, this means the instruments constantly readjust the processed level. Therefore, this instrument is not suitable for assessing the absolute level or even the loudness of a signal; it deals only with level and phase relations between the left and right channels.

Experienced users immediately notice whether a stereo signal has an appropriate stereo width, is shifted from the center and contains out-of-phase components. In general, a wide presentation hints at many out-of-phase portions, while circular images suggest a large stereo width and an appropriate phase. The position of that “ball” quickly shows tendencies toward a side. A mono signal results in a line; its direction on the display indicates the signal position within the stereo panorama.

Real-time analyzer (RTA)

Another important tool for visual audio QC is a real-time spectral analysis. Typical applications of RTAs are found in the area of sound reinforcement; these not only include examinations of room and speaker-system characteristics but also allow for quickly localizing sudden feedback using a peak-hold function on the analyzer display. An RTA can also serve users in production and mastering of music programs and continuity by allowing them to assess the spectral balance of the program and to adjust it as necessary using an EQ. In addition, an RTA provides for localizing interfering resonant frequencies that occur, for example, when recording sources in small speaker booths. Experience has shown that a real-time analysis based on individual third-octave band filters as used in acoustics measuring matches the characteristics of the human ear particularly well and is therefore capable of providing a meaningful representation of the spectrum. Of course, users working with a sample rate of 96kHz in production are particularly interested in the effective bandwidth enhanced to 48kHz. None of us will aurally perceive 36kHz noise produced by the defective fan of an air-conditioning system; however, such spectral components (and their interference) may subsequently cause undesired artifacts in the mix. Therefore, in addition to the spectral representation of the audio range, a summing display of spectral components above the audible range up to half the sample rate would be desirable.

Conclusion

A main objective of any measuring is the comparability of the results. This is achieved by specific standardized measurement units and methods. Unfortunately, many established measurement standards suffer from lack of uniqueness in practical use; this is true for audio technology, too. Scopes of standards remain subject to interpretation, which makes true comparability rather difficult. In this context, the critical point is that the same measuring standards must be applied everywhere within the immediate working area. All instruments must be calibrated accordingly. Never lose sight of the fact that setting up a meter or another analyzer tool will never change the actually examined audio signal — only the personal “view” of it.

Michael Kahsnitz is head of engineering at RTW.