Taking the cloud approach

SCTE 35 has been relied on for two decades by broadcasters and cable TV networks (“content providers”) to signal local avail breaks—the slots that are granted by the content providers to the cable system operators (and other multichannel video programming distributors) for local advertising. In the United States, these breaks are usually exactly a minute long and occur twice an hour. The content providers usually schedule promos and other non-essential filler material in these slots for cable operators to insert their local commercials over. Local commercial insertion is a mature business (and a very big business).

More recently, the use of SCTE 35 has been expanded considerably and it is now used to signal the start and end of programming and national advertising breaks. This is to enable virtual MVPDs to blackout programming where rights are limited and also to replace scheduled broadcast adverts that cannot be distributed over the Internet (either because the advertiser doesn’t want to or because they can’t—adverts have rights too). SCTE 35 can also be useful for automating live-to-VOD (e.g. C3 VOD) and delivering enhanced nDVR capability.

However, there are a number of issues with this expanded use that are hampering the effectiveness of the SCTE 35 standard and, as a direct consequence, the ability of content providers to efficiently monetize the immense opportunities OTT streaming provides.


Given the importance of the SCTE 35 standard, it has been updated multiple times with good intentions. But over the years, this has been done so many times that the standard is now ambiguous in many areas, with no real clarity as to which messages should be used to signal which events—the latest version allows for the signaling of more than 28 different types of events including chapter start, chapter end, ad break start, ad break end and so on.

The main problem with this is that no two content providers use the exact same SCTE messages to trigger the same actions (such as an ad break, for example). Different programmers will signal the same point in a video signal with an entirely different SCTE 35 message. One programmer, for example, may decide to signal the start of an ad break with “Provider Advertisement Start” and another may use the exact same message to signal the start of an individual advert. There are many other areas of ambiguity too. As a result, it’s incredibly difficult for vMVPDs to properly interpret what the content provider is trying to signal. There is no agreed way of signaling events because so many parts of SCTE 35 are optional, and not wanting to be left out, each vMVPD has developed their own requirements. It is truly a mess.

For one content provider to distribute to three vMVPDs, every single one of whom will likely require a different “version” of SCTE 35, the only way to satisfy all three would be to send slightly different versions to each (exact same video and audio but different SCTE 35). Given that SCTE 35 uses bits per second and video uses megabits per second, this rapidly becomes massively wasteful in terms of bandwidth and encoder resources—it is the ultimate tail wagging the dog (or maybe even the flee on the tail wagging the dog).


Even if the content provider goes to the trouble of replicating their video and audio just to provide different SCTE 35 (most of the bigger ones don’t because they have more leverage), there is no guarantee that SCTE 35 will survive transmission and processing. This is because the transcoding and other processing often corrupts the timing of the messages by failing to take into account the decoding and encoding delay. As unbelievable as this may sound, it is absolutely occurring even with established manufacturer’s software. Granted, sometimes this is due to misconfiguration by users who don’t know what a standard SCTE 35 message looks like (because there isn’t a “standard” message…). There are also few tools to inspect the timing relationship between the video and the SCTE 35 message, so it’s often left to the next process in the workflow to deal with.

When this is added to the ambiguity described in the last section it is little wonder that vMVPDs have all but given up—some have even disabled SCTE 35 altogether on some of their networks.


If each vMPVD were able to associate the frame boundary in an incoming video stream with the playlist for that channel, they would be able to configure their own SCTE 35 messages and normalize them for their applications on every network. One way of doing this is to send the raw metadata (i.e. the timing of the frame boundaries for the network’s playlist) out-of-band via a cloud-based micro-service to the vMVPD. At the vMVPD, the raw metadata can then be used to construct SCTE 35 messages to the vMVPDs own requirements (or whatever messaging the vMVPD desires).

[SCTE-104/35 and Beyond: A Look at Ad Insertion in an OTT World]

Crystal decided to take this cloud-based micro-service approach with its Metadata Cloud product, but in order to do so it needed a way of synchronizing the metadata to the video at the vMVPD. Crystal developed a temporal fingerprinting system that can be applied at source and then used to match the frames of video at any receive location exactly. Once the timing is recovered, it is a very straightforward matter to apply the metadata in whatever format is required. This means that a single broadcast feed can be decorated with multiple different SCTE 35 profiles downstream, at any time—even if the video is processed or delayed for any period of time. The out-of-band “raw” metadata and temporal fingerprints require less than 100 bits per second on average, making cloud processing and delivery for large numbers of networks and storage for months or even years practical at extremely low cost (AWS charges cents per Gigabyte).


Sending metadata out-of-band means that SCTE 35 is no longer a constraint on what can be sent (it is now a subset of the metadata that can be delivered). This means that it is possible to send graphics—lower thirds, logos, tickers, etc.—as pure metadata and then apply them in whatever format makes sense for the device being used. For example, a ticker on the bottom of a news channel or business network may be viewable on a flat screen TV but would be hard to see on an iPhone. If a clean video feed with no graphics was delivered to the vMVPD (this is straightforward for a content provider to do—they just pick off the video before the graphics are overlaid), it would be possible for the vMVPD to customize those graphics for their users’ devices. The graphics could even become “clickable” and user selectable–it’s HTML5 after all.

Another important benefit of sending metadata out-of-band is that the same metadata does not need to be broadcast to everyone (as is obviously the case with SCTE 35, which is specified to be in-band in the current standard). This has the obvious benefit of allowing customized/personalized metadata to be mixed in by the vMVPD. Think personalized “clickable” buttons on adverts.

The temporal fingerprinting has the added side benefit of measuring the transmission delay from source to the receive point. This enables a second screen device to have content synchronized precisely to the broadcast being received in each home—the transmission delay varies widely depending upon the network, MVPD and even TV (it can be anywhere from a second or so for over-the-air, to a minute or more if you are watching on an internet connected TV). Being able to send data about an advert to a second screen that is precisely synchronized to what a broadcast viewer is watching will be enormously beneficial when it comes to enhancing viewer engagement.


The metadata currently being sent via SCTE 35 is so important to the future of television that its problems cannot be ignored. By sending metadata regarding the video and audio in a linear television network out-of band, the issues of ambiguity and corruption during delivery can effectively be eliminated. It also enables additional metadata (such as graphics) and interactivity to be added without changing the broadcast infrastructure, thereby providing the opportunity to truly unify broadcast and OTT delivery for the first time.

Alan Young is COO for Crystal.