Correcting IP Network Errors With FEC


(click thumbnail)Fig. 1: Red characters represent corrupted packets.Error detection and correction technologies are crucial. They are present on every CD, DVD and hard drive sold today. They are used for satellite links, WiFi links, and many of the networks routinely used for video transmission. Adding some extra data to a signal can make it possible to detect all errors, to correct most bit errors and to replace missing packets.

One example of checking errors in data transmission code involves the parity bit used on many serial data interfaces. This works by designating one bit in each eight-bit byte as the parity bit, with the remaining seven bits used to carry data, such as an ASCII character. In the case of even parity (commonly used in serial data interfaces) the eighth bit is set to a value of one if there are an odd number of ones in the other seven bits of data, and to zero if there are an even number. With this added information, the receiver can determine if there has been a single bit error in any received byte.

Unfortunately, a single parity bit can’t detect a byte with two errors, and can’t be used to correct errors.

More sophisticated error detection schemes address multiple bit errors. One example is a checksum, which performs binary addition of a series of bytes and sends the resulting sum along with the data. At the receiver, the same addition is done on all the message bytes, and if the sum calculated at the receiver does not match the sum sent by the transmitter, then one or more errors has occurred in that group.

This is the method used in every standard IP packet to detect any errors that may have corrupted the packet header data.

More sophisticated techniques, such as cyclical redundancy check, avoid some of the problems of a simple checksum, which include the inability to detect an all-zero byte that was mistakenly added or removed. An example of a CRC code in a video application is the error detection and handling information that can be added to standard SDI video signals.

ERROR CORRECTION

FEC covers a range of technologies that implement a single, powerful idea—adding extra information to a data stream to allow transmission errors to be detected and corrected. This extra data is specifically designed to allow the receiver to determine if errors have occurred and to correct them. At the output of the receiver, the extra data is removed, so that the stream returns to its original form.

For true error detection and correction, more sophisticated techniques involving fairly intense mathematics are required. One of the most popular techniques is called Reed-Solomon, after the two gentlemen who invented it in 1960. This technique works on groups of data bytes, and can be tuned for a variety of network error types and intensities.

A familiar example of Reed-Solomon coding is the FEC often used with MPEG-2 transport streams. Standard transport stream packets are 188 bytes long, and can be optionally augmented with 16 or 20 bytes of Reed-Solomon code. When 16 bytes are used (as per the DVB standard), up to eight byte errors in the packet can be corrected; when 20 bytes are used (as per the ATSC standard), up to 10 byte errors can be corrected.

Reed-Solomon codes are very effective as long as the duration of burst errors is relatively short (less than eight or 10 bytes long in the case of MPEG). If, however, longer burst errors must be handled, such as those lasting for hundreds of bytes, row/column FEC is typically used. Row/column FEC is frequently used on networks when entire IP packets (often containing seven MPEG transport stream packets) are liable to become lost.

Row/column FEC works by arranging groups of packets into rows and columns, and then adding an FEC packet to each row and each column. For example, a 10-row, five-column FEC scheme would add 10-row FEC packets and five-column FEC packets to every 50 data packets. With this extra data, it becomes possible to correct burst errors of up to five packets in length and even replace missing packets. However, there are limits to what can be corrected. Fig. 1 shows two examples of different error patterns that can be corrected. Packet P9 has been corrupted by a short bit error. Since none of the other packets in row two have been affected, the error can be corrected with FEC. Packets P37, P38 and P39 have also been corrupted. These errors can’t be corrected with the FEC because there are too many, so instead, the correction must be based on the FEC data for columns two, three and four.

Columns two and three can easily be corrected since there is only a single packet error in each column. For column four, packet P9 must be corrected first using the row two FEC, and then packet P39 can be corrected using the FEC for column four.

Of course, the added row/column FEC data is fairly costly in terms of bandwidth, adding 15 extra packets for every 50 transmitted, for 30 percent overhead. End-to-end delay of the system is also increased, because the receiver needs to buffer the block of 50 incoming packets in order to check the FECs and correct any errors in the data block.

Like any technology, FEC has advantages and drawbacks. There is a tradeoff between the error rate and the amount of extra bandwidth needed for correction, as well as the consequences of uncorrected data errors.

One surprising place where FECs are sometimes not used is on DSL lines used to deliver IPTV services. In this case, the extra bandwidth needed to deliver FEC data for every stream is too wasteful. Instead, some operators have systems that automatically recover from packet errors by sending a signal from the receiver back to the transmitter requesting that the missing packets be retransmitted.