Predicting the end

Once again we've reached an end: the end of the year and the end of an era. Recently, I have been contemplating the end.

The Internet

The Internet is a circular data-linking environment with no beginning and no end. But what if it did end? Despite its massively distributed architecture, a few choice catastrophes, could end the Internet as we know it, even if only for a short time.

It's architecture was designed to prevent a single failure from causing a disruption to the whole. During the past 20 years, the Internet has proven to be a robust and organic system with great survivability.

Survivability and uptime are two things we strive for. Systems are only as reliable as their weakest link. Have you ever done a reliability inventory? While this is not important for a massively redundant system such as the Internet, it is important to smaller broadcast systems.

Predicting a system's end

In the last few months, I have seen a main broadcast component crash and — in the eyes of engineers — come to an end. The good news is that with the end of this troublesome component came a shiny new cluster of devices that promise to live a long life and adapt to meet future needs.

The end of life can happen in many ways. In a technological sense, an item reaches its end when it becomes obsolete or is no longer supported. And when it breaks down and can't be repaired, it is certainly at its end.

So why am I dwelling on this topic? Engineers must continually make predictions based on many technical aspects. One of those predictions is the end. Around the end of the year, it's budget time, which means engineers must put on their magic wizard hats and break out their crystal balls to predict what equipment will survive the next year and what will need to be replaced.

When evaluating new systems, I spend a lot of time looking at MTBF numbers. In the early days of video servers, engineers had to constantly weigh the reliability of the devices, specifically the hard drives and how much redundancy (i.e., cost) was needed to be added to the systems to receive acceptable levels of reliability.

You may ask, “Isn't that what I pay all those maintenance contracts for?” While those contracts may give the management some peace of mind, there is a huge difference between the ability to repair a system and it not failing in the first place.

In an all-tapeless facility, broadcasters expect at least 99.999 percent uptimes of all major systems. If the system generates revenue, then even the .001 percent lost could be disastrous. However, with redundancy comes complexity, and with complexity comes more chance for failure. The conundrum is to find a system that is 100 percent reliable, simple to use and easy to maintain.

Interestingly, as systems become more complex, the MTBF can be dependent on the seemingly least important and least technologically advanced component or subsystem. Today, the latest storage technologies have mean times of 1 million hours (about 100 years), which is great, but the capacitor in the power supply may have a MTBF of two years.

If systems are designed properly, redundancy is built into the subsystems with low MTBF. Therefore, a single failure doesn't bring the whole system down. A problem arises if this is not carried through to devices with high MTBF.

Remember, MTBF is just a statistic. Any part can fail at any time for any reason. Thus the conundrum that hits at budget time: low complexity, low cost or high reliability. Balancing these leave little room for unexpected equipment breakdowns. So engineers must hope a system's end does not come before its replacement is budgeted for.

Steve Blumenfeld is chief technology officer for Current TV.