Using Ethernet in the HD studio

In last month's article, we examined some advantages of moving to a common network infrastructure based on new-generation Ethernet protocols. In addition, we discussed some of the underlying protocols being developed to support an IP-based production environment. Now it's time to describe in more detail the next-generation HD studio with genlock, real-time switching and studio network management.

Studio genlock over Ethernet

The video genlock signal exactly synchronizes the frame rate of video equipment with a reference signal also called genlock. Each piece of equipment has to be in perfect sync so there are clean takes and transitions in live switching, editing and post production. Without genlock, switching between sources results in a momentary loss of image stability while the monitor or device tries to lock itself to the new signal. Without synchronization between all sources, the images may roll either vertically or horizontally, or break up completely.

Genlock synchronizes four key video signal attributes: vertical, horizontal, frame and color synchronization. These are all normal parts of a standard composite video signal. When properly combined, the result is a correctly displayed image. A standardized signal, which includes all of the attributes but without any actual video image, is known as black burst. In the absence of black burst, many genlockable devices will instead accept a standard composite video signal for synchronization.

In order to genlock two video sources, at least one must have a genlock input. The other signal source may be used as the master, from which the sync signal must be derived. Alternatively, a variety of sync generators are available, which either produce black burst from an incoming video signal, or generate their own internal black burst references for all connected genlockable cameras in the system.

The problem

The first production goal is to synchronize any camera signals so that video coming from different cameras can be cut and mixed without roll, jump or chroma shift.

The constraints of how precise synchronization must be are strong, down to the pixel level. This requires a timing precision in the range of several tens of nanoseconds. This precision level was originally required by legacy analog equipment, which was quite sensitive to frequency chroma shift and offered minimal buffering.

With the migration to a digital world, past constraints are not so severe. Digital color transmission does not depend on any frequency, and buffering is no longer a problem. Even so, time constraints have not disappeared. For example, if two unsynchronized cameras are shooting the same action, the display of their two pictures on the same screen may exhibit a slight delay between the two images. There may also be some stutter as frames are either dropped or repeated.

The global latency caused by the overall production chain (from the camera's head to the output of master control) must be imperceptible to the human eye. Throughout the entire IP studio, there should be an appearance of exact synchronization among video equipment. In addition, the camera operators must see the video on their monitors without delay as the scenes are shot. An example system block diagram demonstrating genlock is shown in Figure 1.

A system's global, end-to-end delay, from capture to the output of the video switcher, should be constrained to one or two frames. For synchronization reasons, this delay is typically an entire multiple of a frame. Because a video switcher and its video effects circuits usually make use of these one or two frames, the latency inherent to the network and its core equipment must be minimal — preferably, less than one frame.

Furthermore, as with every multimedia flow, jitter is a problem because it increases buffering. The input buffers of a current video switcher are typically proportioned to manage no more than two or three video lines of jitter (a few tens of microseconds). Because packet buffering increases latency, the network itself must have low jitter.

Finally, in a production environment, the boot time needs to be short and accurate. When video equipment is connected to the infrastructure, especially for a live event, it needs be able to start operation in less than one minute. Also, the synchronization systems and related servo mechanisms must have short convergence times.

A layered approach

The main difficulty in synchronizing equipment over an Ethernet/IP network is that packet transmission time over the network is not constant. The consequence is that there is always a time difference between the instant at which a packet is received and the instant at which it was intended to be received. This difference corresponds to the transmission jitter. While its average value may be zero, for each packet, its value is not zero.

A key component of IEEE 802.1AS is the ability to provide an accurate network timing service. This feature will ensure the transmission of a clock with limited jitter, typically in the submicrosecond scale. Further filtering of the 801.1AS clock has been shown to meet the requirements of uncompressed HD video.

On top of this network layer, a video application could transmit counter values at which synchronization signals (genlock) should occur. This layered synchronization is shown in Figure 2 on page 104.

Obtaining clean switches

Currently, professional video production and broadcast environments are dedicated point-to-point infrastructures, allowing both multimedia source transport and real-time operations on those streams.

Today's SD/HD video routers are compliant with SMPTE practice RP168. This practice defines a switching point where the effects of any signal discontinuity within the chain are minimized. This is the case regardless of whether the interface is carrying an uncompressed television signal or a data signal. As they are to the other connected equipment, video routers are synchronized in both phase and frequency by the genlock described above. This is illustrated in Figure 3.

In previous packet-switched networks, like non-AVB Ethernet/IP, core network elements are not primarily designed to support both real-time operations and stream handling. However, with the 802.1AS-based standard, accurate time synchronization is maintained between each component. This will allow new devices and dedicated new protocols to implement the same services as those currently required in today's digital workflow.

A/V streaming

In addition to wall clock (a universal time reference) genlock synchronization, production operations need end-to-end delivery services that allow receivers (monitor wall, video switcher, etc.) to reconstruct the senders' packet timing and to manipulate them accordingly to their time reference derived from the wall clock.

Without the current SDI time information, but based on the time synchronization provided by 802.1AS services, the multimedia streams must integrate accurate timestamp mechanisms to minimize buffering and facilitate interoperability between devices.

The RTP protocol family is one of the most common ways to stream real-time media over networks. This standard specifies the protocol, data encapsulations, connection management and presentation time procedures needed to ensure interoperability between devices that use standard networking services provided by all IEEE 802 networks.

Based on the 802.1 AVB and the new IEEE 1722 AVBTP L3 technologies, it is possible to enhance RTP and remove its deficiencies in some of the QoS features required by a production environment. Table 1 (found at the end of the article) illustrates the basic format of an RTP packet.

The solution provided by these protocols will define the packet format and stream setup, control, and teardown protocols. These improvements will allow RTP solutions to be more effectively used in production environments.

Studio network management

Fundamental performance requirements have to be considered and met in order to achieve the needed QoS level in an A/V studio. This includes:

  • the ability to strictly reserve bandwidth for flows up to 1.5Gb/s or even 3Gb/s;
  • a low latency of less than a few video lines (n*64 µs) over the entire network so as to provide equivalent real-time performance and consistency with the existing infrastructure;
  • low jitter for A/V streaming, meaning minimal latency from buffering; and
  • no packet loss of A/V data.

In order to provide a high QoS for these demanding applications, the network must satisfy two necessary conditions. First, sufficient bandwidth must be guaranteed for an application under various circumstances, including congestion and failures. This condition also will help avoid A/V data packet loss. Second, as an application traverses the network, it must receive the appropriate class-based treatment, including scheduling and packet discarding.

Flow management

Whatever the network, real-time services with high bit rates, such as video, must be guaranteed. At the same time, non-real-time traditional data services should be transparently provided. In the past, the first step taken when designing such a network was to specify the bandwidth according to the global needs of the network. Unfortunately, this solution results in over-provisioning the network, creating additional costs that strip Ethernet/IP studio solutions of part of their cost advantage.

Moreover, bandwidth is not the only QoS concern for the IP studio. Latency and jitter limits are also important. These QoS variables cannot be calculated nor ensured based only on network topology. A mechanism is needed to limit and control the network resources used by the different flows over a constrained network infrastructure.

Flow management allows or forbids some hosts to emit flows and streams with particular characteristics. This decision has to be made according to the state of the network and previous reservations. As QoS needs are constantly changing, flow management has to be dynamic and, thus, automated. Then, whatever the implementation of the flow management system, requests, orders and information will be exchanged over the network. For an AVB network, this function is handled by the 802.1Qat Stream Reservation Protocol.

When the flow management system decides to allow a reserved stream, it must ensure that this stream is flawlessly transmitted to the addressed host. For an AVB network, this function is handled by the 802.1Qav Forwarding and Queuing Enhancements for Time-Sensitive Streams protocol. This protocol implements and guarantees the QoS specifications set previously by the Stream Reservation Protocol.

An Ethernet/IP studio requires several types of signals so system parameters must be monitored and ensured for high QoS services:

  • Multimedia streams have severe real-time constraints. The stream must never be degraded and must have the lowest latency and jitter possible. For an AVB network, this is called Class A service.
  • Management flows include all exchanges required to direct the IP studio. This includes intercom and tally. These controls must be exchanged quickly to ensure proper reaction times for studio crews. This communication requires guaranteed network latency and availability. Even so, this data still must not interfere with any multimedia streams. For an AVB network, this is called Class B service.
  • All remaining information consists of traditional IT data, i.e., that without a real-time constraint.

Figure 4 illustrates the types and relative proportion of the different kinds of flow used in an IP studio. Figure 5 shows the interaction and management of these flows as they pass through the network.

As Figure 5 suggests, the total amount of flow must not exceed the network's available bandwidth. This means that management is required so that bandwidth and other QoS requirements are respected over the entire path followed by each stream, and unmanaged flows (traditional IP data) may be dropped in the presence of network congestion. The AVB protocols described above will manage all this.

However, there are still more global constraints. Any one link in the network has limits, as do the bridges. The AVB protocols only provide a reservation service and a guarantee that any successful reservation will be respected. Therefore, a more global flow management system must also implement a global policy.

The flow management policy ensures that network flows are conveyed within the limits of the network capacity or resources, and that the priorities of studio management are respected. This means that if a particular network resource is unavailable, the flow management policy is used to decide which (if any) flows or streams need to be shut down so a more important (according to policy) stream can get through. A simple example of this is the E911 service available in many communities. Local emergency services must be able to override any existing service at critical times.

The future

As the video production studio has evolved from analog to digital and now to HD, the number of formats and types of processing required continues to increase. In the past, this has meant a continuing increase in the number and cost of interconnects and switching equipment, and a growing complexity as all the formats needed to be converted and merged into a common format.

Initially, the flexibility and low cost of IT-based networking technology was attractive. Unfortunately, the real-time performance, link capacity and QoS management was inadequate for most video tasks.

This is now changing with the introduction of high-speed Ethernet links in excess of 10Gb/s. Combined with QoS and timing guarantees for AVB networks, engineers now have sufficient tools to operate high-quality video studios in an IT environment.

Gaël Macé is a corporate researcher for Thomson, and Michael Johas Teener is a plumbing architect for Broadcom.

Table 1. RTP packet format V P X CC M PT Sequence number Timestamp Synchronization source (SSRC) identifier Contributing source (CSRC_1) identifier - Contributing source (CSRC_n) identifier Payload