Directions for Media Over IP

SANTA CLARA, CALIF.—A loose definition of “gestalt” is “the whole is greater than the sum of its parts.” This is the promise of Internet Protocol in the media facility. IP, coupled with Ethernet, creates a gestalt that is hard to duplicate with any other network technology. Why? It can link every node across networks near or far. It is the lingua franca of IT networking today. IP can carry all facility “data types” on the same link at the same time including files, real-time A/V essence flows, storage I/O, control dialogs, management data, API dialogs, Web traffic, time/sync and more.

So, even though SDI is not “broken,” IP and Ethernet unify the facility and become the master transport for any data anywhere; after all, look at what IP did for the Internet. Of course, special-purpose short-hop links such as HDMI, MHL, USB and similar will coexist with Ethernet.

Media over IP includes files and streams. File-based workflows are mature while streaming AV over IP is a niche product in 2015; SDI is still king in the media facility, however, the signposts are everywhere for media streamed IP’s progress. Many NAB Show/IBC vendors offer a hybrid mix of IP and SDI products. Most systems use SDI-IP bridges today since native media-over-IP I/O is not common. Another indicator is the work by SMPTE, Video Services Forum (VSF), American Media Workflow Association (AMWA) and the European Broadcast Union (EBU). All these bodies are developing standards, best practices and reports for streamed media-over-IP. Since this is a work in progress, the following sections outline the current directions for streaming media-over-IP for professional applications.

There is general industry consensus that real-time media-over-IP for the professional/studio facility will use a “network stack” composed of (media)/RTP/UDP/IP/Ethernet. Breaking this down, the media component data—audio, video, metadata—is carried by the Real Time Protocol (RTP) packaging as specified by the Internet Engineering Task Force (IETF) RFC 3550 spec.

RTP is a simple way to package media along with timestamps and identity markers. In real time a receiver can easily unwrap the RTP container and free the media to be used. The media can be uncompressed or compressed.

MEFs, SEFs and GEFs in an IP media network Next in the stack, the RTP data is carried by the UDP layer (RFC 768). There is no inherent error correction with UDP so the path needs to be provisioned with adequate bandwidth and no packet congestion. This is practical in a facility-controlled environment.

In turn, UDP (RFC 791) is carried by the IP layer. Multicast IP addressing will be common, too, since it permits point-to-multipoint connections as SDI does. The IP layer will most often be carried by Ethernet links, but other choices are possible. The Ethernet standard is administered by the IEEE, not the IETF.

Ethernet has won the physical connectivity wars and data rates of 1, 10 and 100 Gbps are commonplace. There is a standardization effort underway by the IEEE to additionally define 2.5, 5, 25, 50 and 400 Gbps links. This range of connectivity is beneficial for media systems designers since it enables choice of rate to match the media content and bridging to legacy SDI as needed.

Bottom line, using the RTP/IP layered stack, it is possible to achieve the same quality of delivery of a stream as SDI, but in the IP ecosystem. Standards are underway by SMPTE and others to spec these details. The Video Services Forum has recently published Technical Recommendation TR-03 that specs how to carry A+V+metadata in aligned streams using RTP/IP.

With the RTP/IP stack as described above, there are three different ways to “package” the media as a stream. Let’s call the three ways “MEF,” “SEF” and “GEF” for purposes of this discussion (see Fig. 1 for the definitions of these acronyms). Each method is a different technique to carry media. All three can coexist in a system and have a respective pro/con list.

The MEF is a composite flow with the multiple media types wrapped in one envelope (N in 1) carried by single RTP. For example, SDI is an MEF in this context, but without RTP. As is SMPTE ST 2022-6 (SDI payload over RTP/IP). The advantage of an MEF is all the contained media is automatically aligned (lip sync) and goes together across a network as one flow. The topmost envelope in the figure wraps A+V into a single MEF.

An SEF is a single flow in one envelope (1 in 1) per media type—audio, video or metadata. SEFs may be time aligned or independent. Using this form, media streams can be routed, duplicated or dropped in the network as needed. With SDI (an MEF), if a user needs to access just the audio flow portion, a hardware “de-embedder” is needed to split out the flow. With an SEF, there is no need to embed or de-embed media. SEF flows are individual “free spirits.” A sender can transmit associated SEF_A plus SEF_V flows across the network knowing they can be consumed as a perfectly time-aligned pair. Both SEFs and MEFs may be duplicated as needed just as commonly done with an SDI router.

The third acronym, GEF, is a synthetic virtual combination of SEFs and/or MEFs inside the network. In the figure, the indicated GEF is the logical combination of SEF_A and SEF_V; it could be any combo of grouped flows. The idea is to logically combine flows in the network for routing and workflow purposes.

A controller would manage the virtual groups and present such to the user or connection management system. Using software-defined networking techniques enables the grouping and routing of GEFs as desired.

The concepts outlined in this article will become commonplace in the networked media facility. Other aspects of connection management, Quality of Service, service registration and discovery will be discussed in future articles.

Al Kovalick is the founder of Media Systems consulting in Silicon Valley. He is the author of “Video Systems in an IT Environment (2nd ed).” He is a frequent speaker at industry events and a SMPTE Fellow. For a complete bio and contact information,

Al Kovalick