Real-time media networks

Transcoding compressed audio and video, also sometimes referred to as format conversion, for multichannel delivery can be divided into two broad classes: real-time, on-the-fly (such as a simulcast to the Web) or delayed redistribution, where content is converted, stored and distributed. Before addressing real-time transcoding for multichannel distribution, there must be an understanding of networks and the factors that influence their performance.

Based on the premise that a transcoder/format converter processes data delivered over an IT network on a video frame basis, the performance of the media network can be viewed with respect to events that occur in the duration of a video frame. An analysis of the numerical performance characteristics illustrates what factors should be considered when designing networks to support parallel production.

If the media distribution network can’t deliver content at the required data rate, then the transcoded content will not be artifact free. As media is concentrated in a centralized storage architecture, multiple transcoding and playout devices will be concurrently accessing content over a network that will be simulcasting content over various channels. Network routing devices will be pressed to meet the demand of video system, and packet jitter and latency may exceed video frame refresh rates. A look at system performance metrics can aid in system design.

The meaning of frame accuracy

Frame accuracy is a term that gets used a lot when talking about video over IT networks and format conversion. With many broadcast technical processes occurring on a frame basis, this is one of the fundamental issues of importance in the marriage of broadcast engineering and IT.

Frame rate events occur in milliseconds. The durations are:

60Hz = 0.01666… seconds or 16.7ms
30Hz = 0.03333… seconds or 33.3ms
24Hz = 0.04166… seconds or 41.7ms

Consider how many bits occupy a 60Hz, 30Hz or 24Hz duration of an uncompressed SDI signal. As Table 1 shows, SD at 270Mb/s pumps a 4.5Mb (562KB) frame over an IT network every 1/60 of a second. For HD at 1.485Gb/s, 24.75Mb (3.1MB) must be transferred every 1/60 of a second.

60fps 30fps 24fps 270Mb/s 4.5Mb
(562.5KB) 9.0Mb
(112.5KB) 11.25Mb (1.406MB) 1.485Gb/s 24.75Mb/s
(3.094MB) 49.5Mb/s
(6.188MB) 61.88Mb
(7.734MB)

Table 1. Uncompressed video data rate vs. frame rate

For uncompressed SD and HD, these are the constant data rates that must be fed to a device for real-time processes, such as playout or on-the-fly format conversion (transcoding or transrating). Now, consider the capability of 100Mb/s, 1Gb/s and 4Gb/s data transfer bandwidths as listed in Table 2 with respect to a video frame. The 4.5Mb per frame requirement of 270 Mb/s SDI can be met by 1Gb/s networks, while 4Gb/s is required to carry HD. This may be obvious, but real-world systems don’t always meet ideal performance specs.

60fps 30fps 24fps 100Mb/s 1.667Mb
(208.3KB) 3.333Mb
(416.7KB) 11.25Mb (1.406MB) 1Gb/s 16.67Mb
(2.083MB) 33.3Mb
(4.167MB) 41.67Mb
(5.208MB) 4Gb/s 66.67Mb
(8.333MB) 133.33Mb
(16.67MB) 166.67Mb
(20.83MB)

Table 2. Network and Fibre Channel data rates vs. frame rate

Next, consider Ethernet packets. Two common sizes, 1500B and 9000B jumbo packets, transfer a given number of packets as described in Table 3 with respect to 100Mb/s and 1Gb/s network bandwidth.

100Mb/s 1Gb/s 1500B/packet 8333 packets/s 83333 packets/s 9000B/packet 1388 packets/s 13888 packets/s

Table 3. Ethernet and jumbo Ethernet frames per second on 100Mb/s and 1Gb/s networks

An Ethernet packet of 1500B, or 12000b, on a 100Mb/s network occupies 120 microseconds. A 100Mb/s HD video file loads a network with a data burst of 1,666,667b every 16.7ms, which equates to 208,333B. Therefore, 138 Ethernet packets are transferred in one 60Hz video frame, or 1388 packets over a GigE network. Table 4 lists the number of packets for each frame rate for both packet sizes.

Each of these packets has an overhead (header and footer) associated with it, which reduces bandwidth. Jumbo frames of 9000B are used to cut down on these non-payload bytes. A 9000B jumbo frame is 72000b. On a GigE network, a packet occupies 72 microseconds; 13888 jumbo frames are transmitted in one second, or 231 in a video frame.

60fps 30fps 24fps 1500B/packet 1388 packets 2777 packets 3472 packets 9000B/packet 231 packets 462 packets 578 packets

Table 4. Ethernet and jumbo Ethernet packets on a 1Gb/s network vs. frame rate

Network issues

In theory, a GigE network can carry 100Mb/s compressed video, but in real-world installations, three major factors can degrade network performance to unacceptable data rates — packet jitter, latency and loss.

Packet jitter is the variation in the time between packets arriving at a device. This can be caused by network congestion, timing drift or route changes.

Latency is the amount of time it takes a packet of data to move across a network connection. For TCP (Transmission Control Protocol) there is latent time when the computer that sent the packet waits for confirmation that the packet has been received. For UDP (User Datagram Protocol), no handshake is required and latency is reduced.

Best-effort UDP is frequently used for real-time media transfers. When a routing network becomes saturated with more traffic than a device can handle in real time, packets are dropped. This has dire consequences on media quality.

Even when all packets are received, however, latency and jitter can cause problems for real-time media transfers. Jitter and latency can vary for each packet. Referring to the numbers in Table 4, hundreds, or even thousands, of data packets will arrive asynchronously at transcoding devices.

If latency can range from 100ms to seconds while packet jitter can be greater than 40ms, audio and video transfers will take more than one video frame time duration. Table 5 shows how many packets for a given video data rate will be in transit on a network for a given frame rate. The simultaneous traveling on the network of packets for many video programs at any given instant multiplies congestion. As a result, jitter and latency times increase.

60fps 30fps 24fps 40Mb/s 55/9 111/18 138/ 100Mb/s 138/23 277/46 347/57 270Mb/s 375/62 750/125 937/156

Table 5. Video data rate and Ethernet/jumbo Ethernet packets vs. frame rate

The indeterminate nature of an IT network is fundamentally at odds with its use in media applications. It could be argued that Ethernet and IP are the worst possible network technologies for real-time media systems.

A modern broadcast technologist must think in terms of both the broadcast engineering and IT domains when designing a networked infrastructure that must support real-time transcoding. The impact of the loss of data is a factor of video data rate, compression ratio, network bandwidth and packet size with respect to packet jitter, latency and loss.

Tomorrow’s networks

Connection-oriented routing devices and tagging for Quality of Service (QoS) are traffic engineering techniques intended to facilitate the particular needs of transferring video over IT networks. ATM, with its deterministic TDM structure, in many ways is ideally suited for media network applications. RTP (Real Time Protocol) and RTSP (Real Time Streaming Protocol) also attain performance that can support adequate real-time data delivery.

These protocols, however, were never intended to carry digital video, particularly HD resolutions. Now, there is a need to develop new real-time network protocols specifically engineered to support HD video.

Bursty network performance has an impact on real-time format conversion equipment. The next Transition to Digital will delve into how this problem is solved and address transcoding and format conversion for multichannel distribution.