Encoding in a multiservice world

Encoding. What a headache for the entire industry. Not long ago, it used to be so simple to compress television. There was one format (either NTSC or PAL), one codec (MPEG-2) and one way to carry it (ASI). In the past few years, two things have been radically changing: the kind of video people watch and how they watch it.

Have you ever watched a Blu-ray movie on a 50in full-HD screen? It looks amazing. I can't get enough of it. As a matter of fact, I don't even watch regular HD broadcasts anymore, much less SD (except for maybe a good World Cup rugby game).

What if I missed the game and I need to go out of town? Should I purchase it on VOD when I return or download it to my PSP or iPod for the flight? No more battery on the iPod? No big deal. I can watch it online on my laptop computer from the comfort of a hotel room.

Consumers now have the ability to obtain such things, albeit with some effort. Interestingly, I can't think of a single operator today who offers all these services as a package.

There are business reasons as to why they don't, of course, but mostly they are technical. Today a compression system that sends full-resolution HD in H.264 High Profile over satellite has very little in common with a QVGA system for DVB-H or CIF for streaming to PCs. The first uses MPEG-4 AVC (H.264) High Profile at Level 4 (HP@L4) in an MPEG-2 transport stream. The second uses H.264 Baseline Profile at Level 2 (BP@L2) encapsulated in RTP/IP with time slicing, and the latter WM9 or VC-1 in pure TCP/IP packets.

This multiplicity of video formats, codecs and transmission protocols is likely to get worse as the number of receiving device types grows.

Service provider conundrum

As a result of the proliferation of viewing platforms, operators must reinvent themselves in order to effectively compete and reach new subscribers.

For example, because it's VBR by nature, a satellite operator's HD headend today is not designed to handle CBR or capped VBR outputs for IPTV. Yet in order to grow its footprint and subscriber base, a satellite operator has no choice but to diversify and address wireline applications.

More legacy satellite operators are therefore transforming into multi-service aggregators such as SES AMERICOM. It's all about leveraging the immense coverage of satellite to offer prepackaged IPTV services to Tier 2 or 3 telephone companies that cannot afford to build dedicated headends.

The less popular viewing formats or distribution networks are those that take the most investment for the least return, but at the same time allow providers to reach new subscribers. Recent studies show that 50 percent of the channels are being watched only 5 percent of the time. This so-called long tail content is therefore too expensive to broadcast to every user and must be distributed via either SDV or unicast streaming from large VOD servers. (See Figure 1.)

As IP networks become faster and content storage space cheaper, a newer threat — even to large cable multiple systems operators (MSOs) — is the over-the-top model. What's to prevent content owners such as ABC, the BBC or BSkyB to stream content directly to PCs? A new breed of IP set-top boxes allows users to view streaming video from a broadband connection on a TV set instead of a PC. The deeper you go into that long tail, the more unique and user-centric the content becomes. Only Internet portals such as Google's YouTube can accommodate this content.

Not all models are applicable to every class of operator. (See Figure 2.) But whichever the target markets, service providers have one common objective: to streamline their systems and reduce capital and operating expenditures while addressing these new types of services.

The move to an all-IP world

In part due to these new long tail and on-demand models, the number and formats of video sources an average operator has to receive and ingest is exploding. Until recently, in a typical headend, most satellite or over-the-air sources feeding the banks of IRDs have been in ASI format, carrying plain MPEG-2 SD or HD. Now because content lives in many different ecosystems, a provider may have to also receive H.264 SD or HD feeds with multiple audio formats over fiber or a TCP-IP backbone. This imposes two additional constraints on the headend architecture. First, receivers and decoders must be codec agnostic. Second, they must have native IP interfaces.

At the output of the compression system, the same problem occurs. Because a given operator now must be able to distribute its lineup to different types of networks with different constraints in remote geographical locations, one must define a new demarcation point at the IP layer.

Legacy architectures based on ASI simply do not scale well enough to accommodate these changes. Many of the same headend pieces, such as receivers and decoders, are moving to IP input/output interfaces, but this also increases complexity and capital expenditures. As a result, many of these technologies are being absorbed into the encoders or stream processors to form an all-IP headend. (See Figure 3.)

How encoding technology is evolving

There are traditional ways to roll out a new type of service. You can either build a new, dedicated headend and feed it with preformatted content, or you can reformat and transcode the original content in real time before distribution. The former usually requires some kind of offline content repurposing system and a compression system tailored to that specific application. This is, for instance, what a typical wireless operator would do to offer low-resolution video to cell phones on its 3G network. This approach takes considerable investments, both to negotiate contracts for content as well as to build and operate such a specific compression system.

But let's focus on the latter for a moment. First, IP WAN networks are imperfect and asynchronous by nature. Native IP implies that each device can reconstruct IP packets lost in the network. This can be achieved by applying FEC to each IP packet before sending it over. It allows any receiving device on the other end of a WAN to reconstruct up to 5 percent of lost or corrupted packets. This technology is becoming more widely available in IP receivers, decoders, encoders, multiplexers and scramblers.

The next challenge is to transcode the incoming video into the desired codec and format. When talking about transcoding, one often refers to bit-stream manipulation techniques used in low-cost and dense transcoders. These techniques primarily consist of parsing and decomposing the original stream into its constituting elements — picture, slice, macroblocks, modes, motion vectors and quantized DCT coefficients. The structure is preserved and the transcoding is achieved primarily by manipulating the quantized DCT coefficients, also known as requantizing. Requantization works well in MPEG-2 when the bit rate of the transcoded stream is a little bit lower than the bit rate of the original stream, typically 20 percent or less.

However when going from MPEG2 to H.264 or vice versa, requantization does not work because H.264 does not support the same motion compensation and transform as MPEG-2. As a result, a lot of the tools do not carry over, including intra/inter-prediction, new GOP structures and field/frame decisions.

In those cases, the only viable solution is to fully decode the incoming video back to baseband (HD-SDI or SDI) and re-encode it using the full breadth of available tools for the desired codec. This can be done today with external IRDs, which significantly increases capital expenditure and potential points of failure in the system.

This is why decoding technology is now making its way into the encoder itself. A real-time broadcast encoder should be able to take any format and codec over IP, and output any format or codec over IP. (See Figure 4.)

The other advantage of decoding back to baseband inside the encoder is that the baseband video can be scaled and preprocessed before re-encoding. This makes the transcode process even more efficient because unwanted noise is removed and only the final format/resolution desired is actually passed to the recoder part. Noise reduction and up/downconversion technologies will therefore become critical components of future IP encoders.

Finally and perhaps most importantly, because of the sheer number of content formats, the industry is now looking to streamline the production and primary distribution processes. In an ideal world, one would want to create and store only one source or asset for each piece of content. Because content is precious, this asset should be of the highest possible resolution and quality, in 1080p for example. Instead of editing and formatting that asset multiple times for distribution to different types of networks and receiving devices via different compression systems, what if you could push that 1080p content all the way to the distribution point and let the headend do that for you? Wouldn't it be great if your compression system could turn it around in any format, be it 1080i or PAL to legacy set-top boxes, or QVGA to DVB-H mobile phones?

Simulcasting is one way to do it, but again it does not scale economically for large systems. A promising technology undergoing standardization is scalable video coding (SVC). The idea is that a video signal can be coded as a base layer for a given format and incremental enhancement layers for additional formats. For example, a base layer can be in 720p, which can be decoded by HD set-top boxes already in the field, and an enhancement layer can be 1080p for future or newer generation set-top boxes.

The main advantage over simulcast is bandwidth savings, because the enhancement layer contains only the information difference between 720p and 1080p. Another possible use case is a QVGA or CIF base layer for mobile and a PAL enhancement layer for SDTV.

Repurposing content for the Internet

Real-time broadcast encoders can address a large swath of the IPTV and mobile markets. There is a significant and growing portion of the video market, however, which requires a different solution. This portion of the market is dealing with VOD applications to multiple platforms. In this area, the source is no longer a real-time stream of video arriving over ASI or IP. Instead, it is a vast collection of file-based video that must be transcoded to an increasing number of possible destination formats and then streamed directly to the user. The file-based data can come from both traditional content producers as well as the general public.

This can cause a bewildering array of input formats, from high-quality archival footage to the grainiest camera video phone output. All of these inputs must then be conformed to an equally wide range of potential output formats. These outputs might range from HD displays to mobile phones and can have several orders of magnitude differences in data rates.

This type of repurposing demands a flexibility that is best addressed by software-based transcoding. A software transcoder can manage the wide range of inputs and outputs, and can be easily updated to respond to a change in market requirements. For example, over the past year, there have been significant changes in the format, quality and data rate of video being delivered to mobile devices. Each of these changes would have been difficult to navigate with a fixed-format hardware product.

There are now opportunities to tie the strength and performance of hardware encoding systems with the flexibility and convenience of software transcoders in order to deliver solutions that can address the growing needs of today's content providers.

Conclusion

In the future, will video from the same full-HD source be intelligently scaled and efficiently distributed to different devices over different networks? I believe so. It will take a revolution as big as the advent of MPEG-2 compression 15 years ago. In the encoding world, advanced transcoding and SVC are steps in the right direction. But it will require much more than compression technology leaps to make it all happen. It will take new levels of QoS to make the network format aware and asset management systems tightly linked to edge devices with far greater intelligence. The opportunities lie ahead; it's up to us to shape the future of our industry.

Arnaud Perrier is senior manager for encoder product marketing at Harmonic.