Monitoring with media fingerprinting

Lowering the operating costs of television playout has become something of a mantra across the television industry, and it has been especially evident in playout monitoring. Because monitoring has traditionally been labor-intensive and also prone to human error, it has been a natural focus for many streamlining and automation projects.

The widespread adoption of multiviewers and IP-based facility monitoring with signal probing has opened up new, more efficient workflows. A single operator can now manage and monitor complex systems efficiently, with a consequent large improvement in productivity.

However, the targets for cost efficiency are getting tougher, and new approaches are being sought to take monitoring to the next level of efficiency. Looking ahead a few years, it seems that a typical target cost per channel is around $60,000 for channel ingest, preparation and monitoring. Following current trends, this indicates a requirement for channels with almost fully automated monitoring. This movement away from human eyeballs and ears for monitoring the key points of a playout channel is a big step for television, but there are pioneering technologies that offer real potential.

A key technology is media fingerprinting, which can be used with advanced electronic facility monitoring to detect and report playout problems that have been difficult to monitor effectively, such as lip-sync errors and content mismatching in multichannel environments. Before looking at how this new technology can be deployed in facilities, it's worth clarifying what is meant by media fingerprinting as it sometimes gets confused with technologies such as watermarking.

Media fingerprinting is a way of digitally identifying video and audio content so that it can be recognized, along with any playout defects, later in time or downstream in the facility. The difference between a fingerprint and a watermark is that the fingerprint extracts properties from the content and stores or transports it separately. A fingerprint does not affect the original media content and corresponding metadata. It's like a traditional fingerprint, which does not leave a mark on the finger after the ink is wiped off. In contrast, watermarks must alter the source content permanently and in a way that devices along the signal food chain will not change, as this would render the watermark useless. Hence, a watermark is more like a tattoo — though you cannot really see or hear it — than a fingerprint.

Since a fingerprint is an additional piece of data that must be stored or transported, it must have a small data size. Other key media fingerprint attributes include resilience to typical playout processes such as downconversion to SD or PROCAMP adjustments to video and audio. Furthermore, fingerprint generation must be very light in processing load for purpose-built video devices and for PCs that search and analyze databases with thousands of fingerprints.

Fingerprint generation

Naturally, the value and strength of a media fingerprint system are dependent on the underlying creation algorithm. Currently, there are multiple proprietary technologies used for the generation of fingerprints, and these are usually level- and/or motion-based. Some use luminance characteristics, while others use transitions, edges, peaks, frequency and color characteristics.

A typical media fingerprint generator creates separate fingerprint data for the video and the individual audio channels to allow effective multichannel monitoring. (See Figure 1.) This data is multiplexed and can be streamed for storage or live comparison. This fingerprint generation requires minimal hardware and can be implemented on a simple monitoring-grade DA module or more complex interface cards. The fingerprint data stream is very small and represents just 0.0004 percent of a 1080i60 HD signal. This means that the fingerprint data from many HD channels can be transferred quite easily over standard IP networks.

This type of media fingerprint is highly resilient to normal television playout processes, with insensitivity to typical video level adjustments, up/downconversion and video compression. Similarly, the process is insensitive to audio bit rate reduction (compression), audio loudness control performed by gain or dynamic range adjustment, and sample rate conversion.

Media fingerprints are weakened, but not disabled, by aspect ratio changes and by the insertion of small graphics. The video processes that can cause problems are standards conversion, such as going from 50Hz to 60Hz, and prolonged periods of frozen video. On the audio side, an obvious problem is when new audio content is mixed into the original content, such as voice-overs and stings.

During fingerprint monitoring, a simple convolution engine is used to look for matching patterns between two media fingerprint streams at different points in the playout process. (See Figure 2.) At the simplest level, this comparison process can be performed by a single interfacing module. However, the process is entirely scalable, and multiple streams of fingerprint data can be correlated and analyzed using a standard PC-based platform to allow end-to-end deployment in a large facility. (See Figure 3.) It can even be used across multiple remote facilities because the fingerprint data is so small that transfers over a WAN are possible.

Once a match is found between the fingerprint streams, two timing measurements can be performed. The first is to calculate the program-to-program delay. This becomes the regular delay that is found within the signal distribution path. The second measurement is to calculate the difference between the video and audio delay. If the difference is zero, then there is no problem. However, if the difference is not zero, then there is a drift in audio with reference to the video. This drift is also known as a lip-sync error. This comparison process allows any video or audio differences between two or more signals to be quickly identified. The delay between two streams can be measured with a resolution of just +/-1ms.

Fingerprint data streams and alarms can be analyzed by the latest generation of facility monitoring systems, and operators can be immediately alerted whenever problems emerge to promote rapid fault resolution.

Content verification and lip-sync detection

One of the key applications of fingerprinting for television playout is automated content verification while a channel is played out across its distribution path, from the server to the uplink, and back via the return feeds.

Fingerprint technology is well-suited to addressing many typical playout issues, such as a backup channel not having exactly the same content as the primary channel. This kind of problem can be intercepted quickly and accurately by using fingerprint capture and monitoring along the primary and backup playout paths. For instance, by using fingerprint detection at the server and backup change-over, the facility monitoring system can quickly identify any content errors. Similarly, fingerprint detection at the distribution encoder and cable return can spot problems that have been encountered at a cable operator. Even subtle differences in content can be identified, such as missing branding graphics on the main playout chain, when they are present on the backup channel.

Continue on next page

Another key monitoring problem that can be detected with fingerprinting is lip-sync error. This has traditionally been one of the tougher monitoring challenges in a facility, as it is not immediately evident with traditional monitoring using a multiviewer. Naturally, this situation has become more demanding as channel counts have increased. Operators of a multichannel playout facility cannot listen to the audio of each channel; it is just too confusing. Today, many operators rely on level meters to determine that audio is present. Unfortunately, this approach does not allow for the detection of a lip-sync problem.

In contrast, fingerprinting can quickly identify lip-sync errors anywhere along the playout chain, on any individual audio channel. These errors can trigger alarms to instantly direct the attention of the operator to the source of the problem so that the necessary corrective measures can be taken. (See Figure 4.)

Looking ahead

Media fingerprint-based error detection is already a practical proposition for more streamlined television monitoring when used alongside the latest generation of facility monitoring systems. However, the technology would really benefit from standardization and full interoperability in the way that SNMP monitoring has become a benchmark for reporting across all types of television and telco equipment. This would remove the need for additional equipment to capture fingerprints across a facility, as the fingerprints could be generated via the equipment that is already part of the standard infrastructure.

Much wider deployment of media fingerprinting across many vendors is still some way off, but there are encouraging signs toward standardization of video and audio fingerprint algorithms. The 22TV Lip Sync Ad-Hoc Group recently met with the SMPTE Technology Committees in Montreal, and the media fingerprint technology from multiple vendors was reviewed as part of an ongoing process of developing a new standard. It is anticipated that this process will ultimately yield new SMPTE standards that will take this formative technology to the next level of adoption.

Marco Lopez is senior vice president of infrastructure and monitoring at Miranda Technologies.

Recommended reading

Get the TV Tech Newsletter