Standards and Protocols In Streaming Media

Over the years, we've observed the delivery model for moving images evolve from the over-the-air broadcast to cable/direct-to-home transmission and more recently to the Internet. One of the latest permutations is streaming media. Originally defined as just "multimedia" (video, audio and data), streaming media technologies continue to be cast into new applications, extending services heretofore reserved for CRT raster-based images to the domains of the PC, home entertainment and mobile cellular technologies.

The server, protocol and network architectures for streaming content to multiple devices find their evolution in videoconferencing. Whether for personal desktop or widespread delivery, the serving device formulates a system for conveying messages and information to a plethora of end devices.


The standards for videoconferencing, such as H.320 and H.323, shaped this growth and made possible extended applications for the delivery of streaming media. From an evolutionary perspective, videoconferencing technologies have come from dedicated environments governed in part by protocols that were created some 15 years ago. The ITU (International Telecommunications Union) developed the foundations for these protocols, which have since evolved onto the PC domain and are now coming back to the home entertainment environment. Note that the Telecommunication Standardization Sector of the ITU is referred to as ITU-T--when you see reference to T.38, this is the group that addresses such agendas as voice-over-IP, fax-over-IP and more.

H.320, adopted by the ITU in 1990, is the umbrella protocol that handles real-time voice and data communications and conferencing over switched or dedicated ISDN links. Additional protocols then describe such items as call setup between terminals and the handling of data connections. H.320, the once traditional videoconferencing protocol, has been all but replaced by H.323--partly because H.320 terminals are expensive and are generally implemented in dedicated videoconference rooms, which are rapidly being replaced by desktop technologies.

H.320 was never desktop-centric; it required expensive ISDN telecommunications lines and dedicated hardware on each end. As bandwidth costs continue to drop, and availability soars, the Internet has motivated the creation of a videoconferencing technology that produces better results at less cost than traditional H.320 terminal-based systems.

The replacement technology, H.323, with Version 1 adopted in 1996 and superseded by Version 2 in 1998, was developed from the onset for delivery over IP and has extended the H.32x family of standards for the encoding and decoding of audio and video streams. Together with the data-sharing protocol of T.120, H.323 defines a new set of Internet-based communications.

Today, with Version 5 (adopted in 2003), H.323 videoconferencing has all but replaced H.320 video switching. For example, Microsoft's NetMeeting supported its audio and videoconferencing features based on H.323.

The protocols generally utilized in many of the multimedia delivery mechanisms are embedded into two of the better known ITU-T protocols described within T.38: UDP and TCP.

UDP (user datagram protocol), which runs on top of IP networks, provides unreliable, unordered data delivery between devices. UDP lacks retransmission and data-rate management services, yet it commands a high-priority status on the Internet, in turn yielding a generally continuous, uninterrupted service, ideal for media streaming over public and private networks.

Another common application for UDP is the transport of time-sensitive information, such as VoIP. UDP is also fast enough for real-time video and audio delivery. Specified in RFC 768, UDP is generally considered a Layer Four protocol (Transport Control) in the OSI stack. Layer Four is responsible for maintaining the end-to-end integrity and control of the network data session.

The protocol that provides for the reliable data delivery of data between devices connected to an IP network is TCP (transmission control protocol). TCP is a virtual circuit protocol that includes an error-correction mechanism at the router level and ensures the integrity of a stream of datagrams across a network. Generally considered a Layer Four protocol in the OSI stack, TCP is specified in RFC 793, which has been updated by RFC 3168.

IP (Internet work or Internet Protocol) remains the most widely used packet-switched communication protocol for the transport of information between computer systems, especially via the Internet. IP defines the nature of the format of the data packets, and it provides a mechanism that ensures the integrity of a datagram--a packet of data--from node to node, at a "best effort," meaning it provides no assurance of delivery. Note, again, TCP detects errors or lost packets and provides a means of recovery. Referred to as a Layer Three protocol, IP is specified in RFC 791, which is updated by RFC 1349.


For streaming media, two choices exist--unicast and multicast. (See Fig. 1). In unicast, the server serves streams to each and every unique client requesting access. Conversely, in a multicast connection, only one served stream is sent to all clients.

Multicast produces the most efficient use of bandwidth when enabled for several clients, making it well-suited to services such as video-over-IP. Multicasting uses the same bandwidth for dozens of clients as it does for one; however, it may require that all routers in the path have software, firmware or, in some cases, hardware updates. Furthermore, for multicast delivery, the entire path must be multicast-enabled, frustrating many private network providers.

(click thumbnail)Fig 1.
MMS or Microsoft Media Server protocol is the proprietary Microsoft network streaming protocol and is used extensively by Microsoft media player software. MMS protocol can be used on top of TCP and UDP transport protocols over any network medium, with its primary use being the streaming of live or prerecorded audio and video to computers that do not require downloading a file before playing it.

RTSP (real-time streaming protocol) is a control or communication protocol used between client and server; RTP (real-time transport protocol) is the data protocol used by the server to send data to the client. Rather than first downloading a file to the client, RTP plays it in real time, which differentiates it from HTTP (hyper-text transfer protocol) and FTP (file transfer protocol). Often, the real-time protocols are indicated as one, shown as RTSP/RTP. Some services, such as RealSystem Server, will use RDT, its own proprietary data channel, for the delivery of content to RealONE players.

Streaming media servers often use specialized protocols to stream their content over the public Internet. HTTP/TCP protocols are used by Web servers to get streams through firewalls, which are often set up to block UDP traffic.


Codecs define the format of audio and video information. Two codecs, G.711 for audio and H.261 for video, are required by H.323. Through H.323, products are enabled to negotiate functionality for nonstandard audio and video codecs.

As determined by the ITU, H.323 terminals must be able to send and receive A-law and í-law coding algorithms (G.711). Additional audio and video codecs provide a variety of standard bit-rate, delay and quality options that are suitable for a range of network selections.

G.711, H.261 and the two default codecs preferred for NetMeeting connections (G.723 and H.263) offer the low bit-rate connections necessary for audio and video transmission over the Internet. The codec that transmits audio at 48, 56 and 64 kbps is G.711, a high bit-rate codec appropriate for audio over higher-speed connections.

To send and receive voice communications over the network, G.723 allows audio to be transmitted at 5.3 and 6.3 kbps. For VHS quality video imaging in the 64 kbps range, H.261 is considered appropriate.

H.263 specifies the format and algorithm used to send and receive video images over the network; it supports common interchange format (CIF), quarter-common interchange format (QCIF) and subquarter-common interchange format (SQCIF). H.263 is excellent for Internet transmission over low bit-rate connections.

Finally, at the higher end of the spectrum, Advanced Video Coding (AVC) has certainly become one of the most recently noticed changes in the media, broadcast and telecommunications industries. Also known by its ITU nomenclature, H.264 is well-poised to become the industry recognized standard for delivery of media to a range of audiences.

Karl Paulsen

Karl Paulsen is the CTO for Diversified, the global leader in media-related technologies, innovations and systems integration. Karl provides subject matter expertise and innovative visionary futures related to advanced networking and IP-technologies, workflow design and assessment, media asset management, and storage technologies. Karl is a SMPTE Life Fellow, a SBE Life Member & Certified Professional Broadcast Engineer, and the author of hundreds of articles focused on industry advances in cloud, storage, workflow, and media technologies. For over 25-years he has continually featured topics in TV Tech magazine—penning the magazine’s Storage and Media Technologies and its Cloudspotter’s Journal columns.