Ethernet in the HD studio

The IEEE 802 working groups are in the process of completing significant work that will allow vendors to build a standards-based network with the appropriate quality of service (QoS) for HD production. This new set of standards, developed by the organization's Audio Video Bridging Task Group, provides three major enhancements for Ethernet-based video networks:

precise timing to support low-jitter media clocks and accurate synchronization of multiple streams;
a simple reservation protocol that allows an endpoint device to notify the various network elements in a path to reserve the resources necessary to support a particular stream; and
queuing and forwarding rules that ensure that such a stream will pass through the network within the delay specified by the reservation.

These enhancements don't require changes to the Ethernet lower layers and are compatible with all the other functions of a standard Ethernet switch (i.e. a device that follows the IEEE 802.1Q bridge specification). As a result, the rest of the Ethernet ecosystem is available to developers. In particular, the various high-speed physical layers (up to 10Gb/s in current standards, with even higher speeds in development), security features (encryption and authorization), and advanced management (remote testing and configuration) features can be used.

This tutorial will be provided in two parts. The first part will outline the advantages of moving to a common network infrastructure based on new-generation Ethernet protocols and provide an introduction to those new protocols and capabilities. The second part, which will appear in the June issue, will describe how such a network can be used in a next-generation HD studio.

A historical perspective

Those who have experienced television's digital revolution recognize how much the format of the content structures the production environment. Adding a new format usually requires a new production infrastructure and the inherent investment in a new generation of equipment.

On the other hand, with IT and consumer electronics, there is a convergence of technologies where many sources of content and different formats coexist. In these spaces, the cost of connectivity is decreasing dramatically. Content producers should be able to take advantage of this convergence and avoid having to replace equipment each time a new standard is adopted. This is especially true when the replacement standard represents not just one new solution, but rather an entire flexible family of standards.

Unfortunately, broadcast production is one of the last industries to migrate to computer-based digital solutions. The reason is, until recently, off-the-shelf network and computing systems were not powerful or fast enough to handle the demands of broadcast production. The result has been a plethora of proprietary solutions and format-centric point-to-point connections, switching systems and processors. This includes cameras, servers, microphones, video routers, video switchers and audio mixers as shown in Figure 1.

Let's look more closely at studio architecture. Today's audio and video sources are usually processed separately. The video signals are distributed using SD- or HD-SDI interfaces and coaxial cables. The signals pass through a video router, which brings some flexibility in terms of routing, and are finally dispatched to the video switcher, where the technical director can select and switch between the different sources.

The audio signals follow their own interconnects (e.g. AES interfaces and coaxial cables) to reach the audio mixer. The audio is mixed with multiple sources and then combined with the video. It's only at this final stage, after resynchronization operations, that audio and video signals are recombined.

Media production requires additional signals. These include equipment control lines and communication links, such as intercom and camera tally. In addition, audio and video signals need to be monitored throughout the production chain for QoS and content reasons.

A vision: full Ethernet/IP production studio

Suppose, on the other hand, live TV production moved toward a flexible and future-proof communication infrastructure based on Ethernet and the IP stack. The studio could move from a heterogeneous format-centric approach to a homogeneous network/software architecture where virtual routing would make every signal available anywhere in or out of the studio. Such a solution would unify the interconnects for multiple audio and video signals, control and program associated data, VoIP and general computer data.

This vision of a full Ethernet/IP studio requires a major rethinking of the current infrastructure and investigation of the technological consequences. It also places some new requirements on the physical network. For some, these may seem like foreign concepts for a video network. A new topology network will require:

universal connectivity because signals of different data must coexist with different QoS requirements on the same wire;
a virtual switch matrix so routing can be managed with the aid of the network;
frame synchronization that imposes timing requirements not commonly placed on data networks; and
that latency and jitter introduced by the network be compatible with the requirements of high-quality A/V signals.

An example of such an Ethernet/IP studio is shown in Figure 2. An Ethernet/IP production studio network will normally be constrained to a limited location. In the case where the video studio is composed of several buildings or in a stadium, distances are in the order of several hundred meters. Fortunately, this means the Ethernet/IP studio infrastructure can be considered as a single bridged local area network (LAN). This means that there are none of the complications imposed by operating on the WAN.

The studio system will impose severe real-time latency, jitter and management requirements that will exclude certain types of network equipment. In particular, routers and hubs are disallowed.

Routers have unacceptable time characteristics, particularly with respect to latency and jitter, compared with the production constraints. The new system will also exclude solutions exclusively handled by routers using Multi-Protocol Label Switching (MPLS). Hubs and unmanaged bridges cannot be used because they do not allow bandwidth to be managed or latency and jitter to be controlled.

Current digital studios transmit streaming video at 270Mb/s for SD and from 1.5Gb/s up to 3Gb/s for HD. This requires an Ethernet/IP network supporting either GigE or 10GigE links. Today's 10/100Mb/s links could be used for the exchange of IT and control data, but they cannot be used for professional video.

A typical studio may use up to 30 video cameras simultaneously. Such required bandwidth cannot easily be handled by a low-cost 1Gb/s network. However, a network consisting of the newest 10Gb/s or faster links (the IEEE 802.3 working group is developing both 40Gb/s and 100Gb/s updates) will be able to carry all the video. Furthermore, existing high-performance Ethernet bridges can aggregate several links to create virtual interconnects at multiples of the individual link rates.

The technical challenges

Manipulating broadcast-quality signals over Ethernet/IP in real time poses many technological challenges. These include the need to:

define and characterize a suitable Ethernet/IP-based infrastructure for real-time video production;
define an efficient synchronization system to replace the conventional genlock signal (frequency and phase synchronization) and introduce a solution to maintain time relationships between the video, audio and data streams;
define a studio software management tool to ensure the necessary QoS (e.g. bandwidth, latency, jitter) over the Ethernet/IP network via traffic admission control and traffic prioritization;
define a security scheme suitable for the valuable and sensitive sources manipulated in the studio; and
define a studio configuration tool to support integration and administration of the facilities video services.

Examine some solutions

Now that we've defined some of the technical issues, let's look at several of the proposed solutions. An audio/video bridging (AVB) network is one that implements a specific subset of existing layer 2 standards (such as IEEE 802.3 Ethernet or IEEE 802.11 Wi-Fi) and IEEE 802.1 (bridging/management/security/interworking). In addition, the solution will need to incorporate other protocols being developed by the IEEE 802.1 AVB group.

It is important to remember that an Ethernet bridge in IEEE 802 terms is frequently called a switch. Unfortunately, most switches are little more than store-and-forward hubs without management capability. These have no place in a studio or any other place where QoS is important. A bridge is a specific device that has some basic intelligence, allowing it to be used in both plug-and-play and fully managed environments.

There are four primary differences between the proposed AVB architecture and existing 802 architectures. They include the areas of:

precise synchronization;
traffic shaping for media streams;
admission controls; and
identifying participating devices.

Some of these issues represent significant changes. Even so, they can all be implemented using relatively small extensions to the standard Ethernet Media Access Control (MAC) (defined in IEEE 802.3), Wi-Fi MAC (defined in IEEE 802.11) and bridges (defined in IEEE 802.1Q). There is also new work taking place in IEEE 1588 for network timing specifications. Using a minimal change philosophy will allow non-AVB and AVB devices to seamlessly communicate using standard 802 frames. Only AVB devices, however, will be able to send/receive/relay the new timing-based frames or services. (See Figure 3.)

A note on interactivity: The AVB system will support two classes of media streams based on the latency requirements of two primary classes of applications:

2ms for low-latency applications (live musical performances and gaming); and
tens of milliseconds for less critical applications where interactivity is limited to the kind of responsiveness needed by a typical consumer-electronic remote control.

Precise synchronization

AVB devices must periodically exchange timing information that will allow both ends of the link to synchronize precisely their time-of-day clock. This precise synchronization has two purposes:

to allow synchronization of multiple streams; and
to provide a common time base for sampling data streams at a source device and then present those streams at the destination device with the same relative timing.

The protocol used for maintaining timing synchronization is specified in IEEE 802.1AS and is based largely on IEEE 1588, which is currently used for industrial control, test and measurement applications.

Within this architecture, there is a single device within the AVB cloud that provides a master timing signal called the grand master clock. All other devices synchronize their clocks with this master. Selection of the grand master is frequently arbitrary because all AVB source devices are master-capable. However, this grand master clock can be overridden if the network is used in a professional environment that requires a house clock (audio) or genlock (video), or if the timing hierarchy needs to be specified for other reasons.

AVB devices begin by exchanging capability information during link establishment. If both devices are capable of network synchronization, they will start to exchange clock synchronization and configuration information. Ethernet bridges will act as a simple form of 1588 transparent clock. This means they provide timing correction information on synchronization packets they pass from the grand master clock to the other devices in the AVB cloud. In the AVB form of transparent clock, each port has a slave/master attribute where the slave port is the one pointed toward the grand master, and the master ports point away from the grand master. (See Figure 4.)

There are some additional complications of IEEE 802.1AS that only apply for wireless or other shared-media links. These will not be discussed here because the packet loss statistics and additional latency imposed by these links make them inappropriate for audio or video studios.

Traffic shaping for A/V streams

A key part of the proposed AVB architecture is the requirement for minimal lost packets. For Ethernet, this implies full duplex connections. This has been an option for Ethernet connections since the mid-1990s and is already a requirement for gigabit links. This means that there are no frame collisions, and all mulitport devices function as bridges.

The Ethernet bridge specification already defines how to forward frames directly to their destination without flooding the network and how multiple priorities can be handled. The AVB architecture uses these routing and priority mechanisms in a specific way to provide A/V services.

A/V frames

A/V frames are standard 802 frames with 802.1Q priority tagging and normal restrictions on format and length. The unique elements include the particular mapping of priority tags to a well-defined frame-forwarding behavior within bridges, and traffic shaping at endpoints and within bridges. In particular, AVB redefines the 802.1Q default assignments to more specific traffic classes. (See Table 1.)

Traffic shaping at the stream source

Endpoint devices must transmit frames for a particular stream evenly, based on the traffic class (AVB Class A or AVB Class B) and the specific QoS parameters that were used when the stream was approved by the network. (See the section on admission controls below.) The specific rules for traffic shaping are described in the IEEE P802.1Qav specifications.

Bridge operation

The basic mechanism used for single-link interactive media transport is carried over to bridges. Interactive media frames will be forwarded in the same manner as best-effort traffic is in existing 802.1Q bridges. In addition, they will be subjected to traffic-shaping rules based on traffic class and the allocation of bandwidth for the traffic class on a particular egress port. This will have the effect of smoothing the delivery of interactive media frames through an AVB network.

The traffic-shaping rules for bridges also require that frames be distributed evenly in time, but only on a class basis. This means that all the traffic being transmitted out of a particular port for either AVB Class A or Class B is distributed evenly in time and measured using the QoS parameters of that class.

This represents the accumulated value of all the reservations for a particular class, for the particular port, made by the admission control process described below. The effect is that delivery jitter accumulates slowly and less than linearly with hop count. It also reduces the size of the output queues needed at all network output ports, even if the network has many hops.

Non-AVB traffic, because it is not subject to the admission control process, does not have reserved QoS and is subject to frame drops if resources are constrained.

Bridges are also required to do a basic filtering function at ports that are not connected to AVB devices. They must remap the priority field of any received tagged frames so that they do not collide with the priority values used for AVB Class A or AVB Class B.

Admission controls

Even though the preceding mechanism can reliably deliver data with a deterministic low latency and low jitter, it will only do so if the network resources are sufficient. This means that port bandwidth and bridge buffer space must be available along the entire path from the talker to listener. In this architecture, it is both the talker's and the listener's responsibility to guarantee the path is available and to reserve the resources. The process to do this is specified by the P802.1Qat Stream Reservation Protocol (which is further based on the P802.1ak Multiple Registration Protocol). The process registers a stream and reserves the resources required through the entire path taken by the stream.

Here is how it works: The listener sends a register frame to the network with the stream address (frequently a group address). The intermediate bridges create a forwarding database entry for the stream back toward the listener and transmit the registration to the other ports. If the bridge is already routing the stream, it can respond on its own, acting as a proxy for the talker.

When the registration request reaches the talker, it returns a reserve frame toward the listener, which includes resources available (or not), worst case delay, QoS requirements (e.g. traffic class, bandwidth in bytes/class interval and the maximum number of packets to be sent in a class interval), address information and stream identification information.

An intermediate bridge receiving a “reserve, resources available” frame attempts to allocate bandwidth on the output port back toward the listener. If the resource allocation succeeds, the bridge sends a “reserve, resources available” frame back toward the listener. (The bridge already has the correct entries in the forwarding database because it has already participated in the upwards registration request.) If the resource allocation fails, the bridge sends a “reserve, resources not available” frame.

An intermediate bridge receiving a “reserve, resources not available” frame will just pass the frame out to its other ports. When the listener receives a reserve control frame, it will know whether the resources are available, and if so, that the resources have been reserved and the delay for the path. It can then respond with a ready frame that is forwarded back toward the talker.

Intermediate bridges use the ready frame to lock down the resources needed by the stream and to make the appropriate entries in their forwarding database. They allow the stream to be sent on the port that received the ready frame.

Once the talker receives a ready frame, it can start transmitting the stream. Obviously, various time-outs and disconnects affect the process, but the basic ideas have already been worked out. Additional listeners also send registration requests to the talker, but this time an intermediate bridge can respond if it is already forwarding the stream. The talker can take down a stream by sending an unreserve message, and a listener can disconnect by sending a not ready message.

Other methods can be used to take down a connection and release the allocated resources. For example, the listener must periodically resend registrations and ready messages, and talkers must periodically resend reserve messages. That way, any receiving device (including intermediate bridges) could automatically release assigned resources and notify higher layers if the appropriate registrations and reservations were not received.

Identifying participating devices

Because the entire AVB QoS scheme depends on the participation of all devices between the talker and listener, any network element that does not (including so-called unmanaged bridges) must be identified and flagged so that network edge filtering and management schemes can be applied. The identification method used is a combination of 802.3 link capabilities, plus a small enhancement to the 802.1AB Logical Link Discovery Protocol and link delay measurements performed by IEEE 802.1AS.

Therefore, an Ethernet link peer is considered an AVB device if:

the link is capable of full duplex 100Mb/s or greater;
the 802.1AB link layer discovery protocol packets are received on the link with exactly one MAC source address;
the Link Layer Discovery Protocol IEEE 802.1 AB (LLDP) packets include the 802.1AS-capable and 802.1Qav-capable attributes; and
the round-trip delay is no more than a worst-case wire delay. This will be computed from the IEEE 1588 PDelay exchange and should be no more than approximately 2ìs.

Higher layer interfaces

The mechanisms described represent a network layer 2 toolkit to support a streaming QoS. A useful system requires the addition of several higher layer services.

This would include stream transport protocols for the data. We'll assume that most of the traffic will be IP-based, in particular RTP traffic. Even so, there are other transport protocols in development, including IEEE P1722 Audio/Video Bridging Transport Protocol, which is a simple port of the IEC 61883 formats used by IEEE 1394 FireWire.

Our network would also require a mechanism for discovering the devices in the network and their streaming capabilities. The system also needs to provide mappings between the layer 2 mechanisms and existing higher layer QoS services. For example, RSVP for standard RTP streams or the universal plug-and-play QoS mechanisms used by the Digital Living Network Alliance (DLNA), a consortium of CE, computing and mobile device vendors aiming to specify a wired and wireless interoperable network for such devices in the home.

What's next?

With this information as a foundation, we're ready to examine how Ethernet can be used to support the needs of an HD studio. In a June Broadcast Engineering magazine article, we will examine how the IEEE 802 working groups are creating specifications that vendors can use to build a standards-based, high QoS HD production network.

Michael Johas Teener is a plumbing architect for Broadcom Corporation, and Gaël Macé is a member of Thomson's Corporate Research.

Table 1. AVB redefines the 802.1Q default assignments to more specific traffic classes. AVB technologies identify only two classes of applications mapped on two distinct IEEE802.1Q priorities and leave the other unchanged. The empty cells in the AVB definition column are identical to the IEEE 802.1Q traffic type definition, which is why the cells are empty. Priority tag Traffic type (from IEEE 802.1Q-2005) AVB definition 1 Background 0 Best effort 2 Excellent effort 3 Critical applications 4 Video < 100ms latency and jitter AVB Class B: moderate latency stream (tens of milliseconds latency) 5 Voice < 10ms latency and jitter AVB Class A: low latency stream (< 2ms latency) 6 Internetworking control 7 Networking control