Network monitoring

When I started working at a CBS facility many years ago, part of my job for four hours each day was network monitoring. It was not very exciting, but it came with the territory. These days, network monitoring has an entirely different meaning, and doing it well can keep your facility on the air.

Why monitor?

Let us start the discussion on network monitoring by asking two questions. First, why do you want to monitor the network? Second, what sort of traffic flows over the network? Once these questions are answered, you can start to look for tools that will help do the job. Your reasons for wanting to monitor the network may include:

  • Keeping an eye on available network capacity;
  • Being aware of any unusual changes in network usage patterns;
  • Looking for network errors and maintenance issues; and
  • Providing a log for troubleshooting forensics.

In a typical broadcast facility, there are generally several networks. Most networks in a facility are used for regular network traffic, such as office applications, e-mail and Web browsing. Most TV facilities have networks in the master control area that provide this functionality. Broadcasters make special use of their networks by moving large files, in some cases, continuously. They may also use networks to backhaul feeds from a remote location to the studio. These networks may impose special network monitoring requirements above and beyond that required for office networks.

As said above, one reason to monitor may be to keep track of available network capacity. Clearly, having available network capacity is important. But why is excess network capacity important? And how much excess capacity is needed The answer comes from understanding how networks function. Ethernet networks rely on a couple of fundamental concepts. First, these networks are designed to move office traffic. The packets are small, and compared with video, the overall file sizes are small. Second, the network assumes that not everyone wants to talk at once. If these two assumptions hold, then it is likely that there will be extra network capacity when someone needs it. And, if by chance, the network is busy when you try to use it (two devices try to talk at the same time — known as a collision), wait a little while, and try again. The network is likely to be free later.

In my experience, all of this works well until network usage starts to go somewhere above 70 percent. Then things go bad in a hurry. Once this line is crossed, collisions become much more frequent. Devices wait a little while, try again, collide, wait again, try, collide, and on and on. As the network slows down, some applications and users start trying to talk more frequently, assuming that because the first attempt to use the network stalled, the best thing to do is to try again. This only adds to the congestion. Finally, remember that because moving video files are large, they take a lot of time to transmit. Video files increase the usage level on a network, not for just an instant, but in some cases for tens of minutes at a time. This sustained high-usage traffic pattern means that the likelihood of collisions increase because of the characteristics of our data.

The second reason to monitor the network is to be aware of any unusual changes in network usage patterns. This is intuitive, but let's explore some reasons network usage may change. One reason could be that you have installed new equipment, or that the production department has added an evening shift. In other words, the change in network usage can be easily explained.

Other times, network usage may change drastically because of a hardware problem — perhaps a network device has failed in such a way that it spits out garbage on the network as fast as it can. Or maybe there's been a security breach, and someone is downloading large video files from your system. A large decrease in traffic is worth investigating as well. Has the WAN connection to a remote office failed? Has an automatic file conversion process stopped running?

Third, monitor the network for errors and to detect maintenance issues. There are some errors, such as a large number of CRC errors, that should not occur on a properly functioning network. Also, if the network utilization is well below 70 percent, but there are a large number of collisions, you should be concerned. In any case, if there are many technical errors on the network, you may be able to find the cause of the errors and fix them before experiencing a catastrophic failure of the network. These errors may be caused by a bad patch cable or a failing network interface in a router. In any case, hard network errors such as these should definitely be identified and corrected before they become more serious.

Fourth, monitor the network so you have a log for troubleshooting forensics. Forensics is the scientific analysis of physical evidence. If you have been monitoring the network and experience a failure of the on-air automation system, you may be able to trace the problem to a network failure. If you do not have network monitoring logs, then it might not be possible to isolate the network as the cause of the failure.

Monitoring tools

Most monitoring tools are free. While there is a lot of value in commercially available tools, personally, I have never run into a problem I could not solve using the free tools (with a few notable exceptions, which I will talk about below).

Think about what you have available already. Routers, MACs, PCs and servers all come with built-in monitoring and logging capability. Cisco has several applications that allow you to remotely monitor traffic through the routers in a graphical display. Other router vendors provide similar functionality. Do not forget Windows Task Manager, which has a built-in graphical network monitoring tool. Finally, you can download free applications such as Wireshark (www.wireshark.org), which provide all sorts of analysis on your network based on packet capture and analysis.

There are cases where I would recommend commercial tools for network monitoring. The first is security applications where you want to detect break-in attempts. The second is MPEG Transport Stream monitoring. There are several commercially available tools that will help you determine where things are broken when using MPEG-over-IP for transport. These provide great functionality, and I have not seen a free tool that comes close to what these applications can do.

Brad Gilmer is executive director of the Advanced Media Workflow Association and president of Gilmer & Associates.

Send questions and comments to:brad.gilmer@penton.com