Testing, Testing…Is This Thing On?
July 25, 2007
“Check, check, can you hear me?” If you’ve ever been to a meeting where a PA system was being used, chances are it started with those words. Once everyone assures the person speaking that the technology is doing its job, the meeting begins and nobody gives it a second thought. That is, unless something goes wrong later.
That approach to testing a system is good enough when the system is relatively simple and the consequences of failure are minor. Unfortunately, the approach we often take in testing more complex systems isn’t much better.
Traditionally, testing was an important part of installing broadcast systems. Analog video and audio systems required signal levels to be precisely set in order to ensure proper operation and preserve quality. As a result, engineers learned to test and measure signals along the signal path to make sure everything ran smoothly before, during and after system installations.
IT-based systems have made us lazy with regard to testing. As with many digital systems, a fair amount of configuration during the installation phase is often required. However, if a device gets connected and appears to be working, it generally isn’t given a second thought. As with the microphone example, we tend to only get concerned if something goes wrong later.
For example, while troubleshooting a particularly stubborn problem recently, our technical team thought to start checking the SID of some Windows-based broadcast systems. First, a little background; a computer SID is a security identifier, basically a number that is assigned to a particular workstation or server when the operating system is installed. The SID is intended to be unique to each physical server or workstation.
OK, back to the troubleshooting. In the process of checking SIDs on broadcast systems, we discovered several systems had been installed with duplicate SIDs. After taking a closer look, we realized that systems from vendor X shared an identical SID and systems from vendor Y shared another common SID. Yikes!
We were able to determine that the vendors in question used a technique called “imaging” to create a master copy of the software needed for their systems. These images were then used to clone systems for installations in the field. This is a pretty smart approach because it minimizes the likelihood of introducing changes to a configuration that’s known to work. This technique of imaging also duplicates some supposedly unique tidbits, like SIDs, so the imaging tools generally provide a function to recreate the SID when the imaging is performed. For whatever reason, that process didn’t work with these systems.
Ultimately, we were able to identify and correct the duplicated SIDs using some readily available tools (search the Internet for “getsid” and “newsid” if you are interested). And, while this turned out to not be the cause of the problem being researched, we did determine that it was creating some confusion with other key systems on our network domain. Just as important, this issue diverted us from the real cause of the problem and wasted valuable time. Had we tested the SIDs on these systems when they were installed, we would have saved ourselves a lot of time. But when those systems were installed, they worked and nobody gave them a second thought. We test SIDs now.
There are other examples. We’ve found issues where the speed or duplex of a network connection had been incorrectly set for years, creating bottlenecks that were just lived with. Similarly, we’ve found issues where computers with multiple network cards had been installed with the incorrect binding order, causing seemingly random problems.
Some of these issues may only create nuisances in the form of unexplained events showing up in system error logs. Others may bring critical systems to their knees with no warning. That’s the wonderful thing about digital systems, they often work great—right up until the point they fail completely.
Broadcast engineers figured out years ago that this issue could occur with digital television. If an engineer just plugged things in, they often were rewarded with pristine video that no analog system could be tweaked to reproduce. But under the hood, the system could be on the edge of complete failure. Their response was to simply continue the same basic testing principles that worked in the analog world. Namely, check everything.
The tried and true approach of testing systems during installation should be a part of any IT checklist. Granted, there are a lot of things that could be checked and there probably won’t be time to check them all. However, one can create a checklist that covers some of the more common issues and develop build procedures for systems that use tested images as a starting point (just check those SIDs). Whatever you do, don’t wait until systems fail before you start testing them. You’ll be glad you did. Count on IT.