Anticipating Errors In IP Networks

Errors happen. That's a fact for any kind of network, whether optical, electrical or wireless. To deal with the errors, it's important to understand what causes them, and how IP networks react to them. This column is the first in a two-part series. Here, we'll focus on the errors and their causes. In the next installment, we'll focus on how systems can be designed to work around network errors.

There are two main categories of errors that occur on IP networks. The first concerns packets that become missing or damaged in transit. The second concerns packets that arrive at time intervals different than when they were sent. All types of data transmissions are affected by the first category; the second category only affects real-time signals like video and audio.

When packets containing video or audio data don't arrive at their destination, problems occur, due to the real-time nature of the signals. In many networks, there isn't time to send a request back to the source for a retransmission of the missing data.

Many video networks are specially designed to minimize errors, but they can never be completely eliminated. Even short-haul digital fiber-optic links have an intrinsic error rate, although the rates can be extremely low (less than one bit error per year in some cases).

Bit errors can be random, or they can occur in a repeatable fashion, just as TV signals can be affected by random interference or a specific source, such as a leaky CATV system.

Note that wireless packet networks (such as WiFi) have a much higher bit error rate (on average) than wired or optical connections, due primarily to interference.

In a packet network, bit errors can cause packets to be deleted entirely, or simply change the contents of a packet. All IP packet headers contain a checksum along with critical data such as the source and destination IP addresses. This checksum allows each device that receives the packet to check the header data for integrity. If a checksum error is detected, then the packet is destroyed immediately. This is done to avoid having packets with bad addresses or other corrupted header data circulate any farther through the network.

If a bit error happens in a part of packet other than the header, the packet may or may not be destroyed in the network. One popular protocol that rides on top of IP, UDP (User Datagram Protocol) provides an optional checksum for an entire data packet; this feature is often not used in video and audio applications because these streams often include information that allows simple bit errors to be corrected.

Several different kinds of network interruptions can also cause packet losses. Hard network failures can cause complete signal failures. In protected networks (common in most long-haul systems and telco networks) a hard failure can cause an outage of up to 50 ms, which is enough time to eliminate a whole frame of video data or cause a noticeable click in an audio signal.

These errors will show up as a large number of consecutive lost packets. IP networks have additional potential faults, including packet losses that occur when routing errors are discovered and changes need to be propagated through the network, which can take a significant amount of time.

Overcrowded networks can also cause packet loss. When an IP router has more packets going to one destination than that link can handle, the router will put all the outbound packets into a temporary buffer (a “queue” or “FIFO”) that holds packets until the link is ready for new ones. If they start to fill up, the router may discard some of the packets to prevent the buffers from overflowing catastrophically. On private networks, this process can be controlled to prevent discarding packets with high priorities. However, on the public Internet, packets are not given priorities, so any type of packet can be discarded.

PACKET TIMING ERRORS

Timing errors in a video signal are damaging in three different ways. First, data that arrives very late can miss the point in time when it was needed to form an image in a video sequence or sound in and audio program, thereby severely corrupting the content.

Second, data that consistently arrives too early or too late can cause the buffers in the receiving equipment to overflow or to run dry, both of which can also severely corrupt audio or video sequences. Third, packets that arrive at time intervals different from the time intervals used at the source can create video and audio timing problems at the destination. These latter packet timing errors are known as jitter.

To understand why jitter is a problem, it helps to know a little bit about clocks and MPEG. Inside each MPEG transport stream is a series of data values called the program clock reference, or PCR. These values count the ticks of a 27 MHz clock at the encoder, and are used by the decoder to accurately re-create the video clock in the output. When the packets containing these values are affected by network jitter, then the video clock is no longer accurate and this can result in frozen or dropped video frames.

Packet timing errors can be created when packets in a router are forced to wait for other packets to be sent out of the router. This is a very common occurrence—most routers are set up to send out any available packet whenever any output link becomes available. This is good because it allows the link to be fully utilized, but is it bad because any packets that have newly arrived in the router are delayed slightly while waiting for the outbound link to free up so they can be sent out. These short delays can add up in a network that consists of several hops, and are a key contributor to packet jitter.

Another source of timing errors occurs when packets from the same stream take different routes through a network. This is fairly rare in most properly engineered networks, when it does happen the destination device needs to sort out the correct order of the packets before creating a video or audio output.

Now that we've discussed the two basic types of packet errors that can occur on IP networks, we need to discuss how we can design a system to work in spite of them. We'll dive into that discussion in the next edition of Video Networking.