SLAs, QoS and guaranteed WANs

Service level agreements are an important part of creating a reliable wide area network.
Author:
Updated:
Original:

Did you know that IP networks are used every day by Fox to get critical sporting events backhauled from the venue to a facility in Los Angeles? Did you know that other broadcasters, Hollywood studios and media companies do the same thing? You might think that IP transport is unreliable and not suitable for professional video. This article will discuss how to create an IP network with great reliability.

SLAs, SLS', TCAs and TCS'

To get the reliable performance described above, start with a service level agreement (SLA) with your video service provider. An SLA serves as the baseline and sets expectations both on the part of the service provider and the user. We talk about SLAs and quality of service (QoS) together as if they both were technical frameworks, but this is not strictly true. The SLA is the part of a contract between the two parties that describes the characteristics of the transport, particularly in terms of reliability, that the user can expect the service provider to deliver. SLAs can be simple, or they can be complex. But usually SLAs cover two broad areas: business and technical.

When it comes to business aspects, these are broken down into SLAs and service level specifications (SLS). A few aspects of SLAs and SLS' are suitability for use, price and technical support. Of course, when you enter into an agreement for transport from a service provider, the assumption is that the service being offered is suitable. But misunderstandings can occur if the two parties do not both agree what constitutes an acceptable service. There is not much to say about price except that it is a common part of the overall agreement and that pricing, penalties for lost service, and what constitutes a loss of service should all be covered in the SLA and SLS; otherwise, difficulties can ensue. For example, does a one-second loss of signal constitute an outage over a one-month period? What if you get 10 or 20 one-second outages in a month? Is that equivalent to a single 10- or 20-second outage? You get the idea.

Finally, technical support is definitely something that should be specified in an SLA. When will the provider answer the phone? How will it escalate a problem? Will it do the same thing on the weekend or in the middle of the night?

Figure 1. The ITU-T Y.qosar framework

The SLA and SLS should also include technical aspects. Some common elements to include are performace metrics, availability metrics, mean time to restore, and operations and service windows. Performance metrics you may want to consider are things such as IP loss rate, IP error rate, jitter and wander, and out of order and reordered packets. Availability metrics are interesting. Of course, you want to track the availability of the network, especially if you are going to get a rebate for times when the network is unavailable. But just because you take hits on the network does not mean that the network is unavailable; the terms of the SLA will determine that. You might wonder how this can be. You can measure every single bit as it transits the network, and you can make a note every time a bit is dropped. This is usually reported as bit error ratio (BER). Numbers such as 10-3 or 10-5 are not uncommon. In the case of 10-3, there is one error every 10,000 bits. But it is strictly a negotiation between you and the service provider to determine at what level of error an outage has occurred. You can expect to encounter terms such as BER and errored seconds when you talk about these things.

Also, should you measure every time there is an errored bit during transmission, or are you going to measure outages in terms of errored video output from a decoder? What if the system employs forward error correction (FEC) and can correct for errors in transmission up to a certain level? There is a lot to talk about with your provider, and it is much better to cover these things at the beginning rather than during a problem.

In addition to SLAs and SLS', you will also encounter traffic conditioning agreements (TCAs) and traffic conditioning specifications (TCS'). These documents describe how traffic on the network will be treated. More on this in a minute, but what happens if you say that, in all cases, video is the most important traffic on the network, but somewhere down the line, a bank has entered into an agreement whereby its bank transactions take priority? If the network gets busy, then the provider is going to drop packets, and if you do not have consistent TCAs and TCS' all the way from source to destination, packets may get lost on purpose because the bank takes precedence.

QoS frameworks

To deliver the quality of service you establish through SLAs and TCAs, providers use QoS frameworks. Figure 1 shows a QoS framework established by the ITU, as described in the document ITU-T Y.qosar. (See Figure 1.) Cisco has a QoS framework, and there are other frameworks as well.

As you can see, the ITU framework has several planes. The management plane allows the provider to manage the performance of the network according to the SLA. The control plane deals with control of the network, and the data plane has to do with the traffic transiting the network. A QoS framework probably will address all of these areas.

I have a limited space, so I am going to pick a few areas to discuss. You can find more detailed information online.

In the management plane, this article has already covered how the SLA is established. It has also touched on service restoration. Policy and metering has to do with establishing policies for traffic on the network, especially when things get busy. One policy could be that you will discard data traffic before discarding video traffic. Of course, to implement this policy, you will need to meter the traffic on the network in order to know if policies are being followed.

On the control plane, admission control is an important concept. Obviously we are not talking about loosely controlled networks here. We must control what sort of traffic is allowed on the network. This is done through admission control. To control delay variation from one packet to another, QoS routing can be established, where packets always follow the same path through the network. This can also help with out-of-order packets. Also, we want guaranteed throughput, and one way to achieve this is to reserve network bandwidth.

At the data plane, you are dealing with packets on the network. In order to meter traffic on the network at the management level, you will need to know what sort of traffic is on the network. One way to achieve this is to use packet marking to describe what is in the payload of each packet. To do this, you need to use traffic classification to put each packet into a particular group for marking. Queuing and scheduling can be used to control the flow of traffic on the network so that you do not overflow buffers in routing devices, and buffer management can be used inside the routers to allow high priority traffic to avoid being dropped except in the worst congestion conditions.

Do you need QoS?

As you can see, QoS with all of its components is a rather complicated topic. The good news is that these frameworks allow providers to deliver guaranteed performance across their networks. But because of the complication, you might want to carefully ask whether you need this sort of solution before deploying it.

Brad Gilmer is president of Gilmer & Associates, executive director of the Advanced Media Workflow Association and executive director of the Video Services Forum.

Send questions and comments to: brad.gilmer@penton.com