Video storage technology

John Watkinson spoke at a SMPTE conference in New York a number of years ago. The conference featured many papers on the topic Computers and Television: Convergence or Collision. Watkinson's intellectual and thoughtful style and understated British manner were perfect for the topic. He described hierarchical storage, and asked the question, “If you get your video back without fail when you hit the play button, do you really care how it was stored?” Mark Schubin once referred to this same talk, saying, “In Watkinson's view, all recording should be done in non-specific ‘bit buckets’, with a computer figuring out what got recorded where and when.” Does location matter? One might be tempted to answer yes or no depending on whether his or her “bent” is technology or operations.

To the operations person, the most important thing is that the record/play process is 100 percent reliable, 24/7. Anything less is simply inexcusable and will generate a request for a backup recording, or two perhaps. Other attributes may be important, such as stunt modes, play while record, proxy generation and perhaps compatibility with other parts of the production process.

To the technologist, the question is more complicated, as it always is. First, is the quality appropriate to the intended use? Second, is the hardware and software reliable and understandable so it can be fixed if necessary? Third, are transcoding steps necessary to use the content? Fourth, does the storage system interface with other hardware and software systems in place? And there are perhaps many more equally important questions.

But today the most critical questions are rather simple. Is the content preserved and replayed appropriately, and is the metadata describing the structure of the essence (content) and the descriptive metadata related to the content preserved and made usable in the future? I believe Watkinson was absolutely correct in asking if one even cares where it is recorded. In a networked environment, the location of the content is a rather abstract concept anyway. At the deepest level, of course we care about the process of storage and retrieval, but at the operational layer, it is much more important to have access to the existing metadata, and equally important to be able to add or modify descriptive metadata at will.

In fundamental ways, this is a new model of which some in the industry are not fully aware. Today, delivering content most often means delivering files. Although it is possible to put the file on physical media to transport it to a new system remotely or locally, more often the content (essence and metadata) is passed over a network to the next process, or storage location. That physical media is a second instance of the content, a clone. It is indistinguishable from the original content, which of course necessitates that a management layer know where the instances live. Media asset management and archive management software can be the repository of the tags that are attached to each instance, allowing intelligent content control. One needs to know where content is, what it is in full detail, and by implication how to get it back once it is sent to a new storage location.

That new storage location might be an off-site deep archive of data tapes (LTO or other format), or it might be simply a spinning disk in the next room. (See Figure 1.) Watkinson would say it doesn't matter, if a user asks to play the content and it seamlessly plays at the time requested. If the advance warning of playback requirements is received minutes before air, a tape stored in a mine is of little value. But Watkinson was alluding to a scenario in which users might have the first 10 minutes of the content cached in a location immediately available, but a request has to be issued to a connected archive to retrieve the rest while the cache is playing to air. As long as the playout does not run out of content before the archive is restored to an appropriate playout location, it really is immaterial where each of the bits of essence were located before they were sent to air.

I prefer to think of this as virtual storage rather than hierarchical. Virtual because from the output spigot it looks like a single playout pool, and the physical nature of the processes is abstracted and kept out of view from the playout application. The recording process thus can be arbitrarily complex and obtuse, and yet the users will not know or care.

Increasingly, the metadata has taken on particular importance. The structural metadata describes the essence's physical makeup (compression standard, sampling grid, frame rate, number of audio channels and how they are coded, SMPTE UMID, etc.). Structural metadata is often taken for granted, but without it often the content cannot be decoded without considerable sleuthing and reconstruction of the missing structural metadata. Most often carried in the recorded header of a file, it is quite limited in size.

Descriptive metadata can be arbitrarily expansive and often contains things such as title, episode, ISBN/AdID/ISCI/ISO registration tags, digital rights management information, and even extensive information such as script information and full text translation of the content. In many cases, the intent is to be able to modify descriptive metadata, or add/delete individual descriptors. This rich source of information about the file can be contained in the file itself, or carried in a “side car” file, using the registration tag (or SMPTE UMID) as the reference tying the essence and metadata together.

Though I have ignored removable media used in ENG and production venues, all of the above applies to any essence recorded as a file, which is to say the vast majority of the output of all modern recording technologies. The broadcast industry has become dependent on the “essence/metadata” paradigm precisely because it allows abstracting the recording technology from the use of the content, enabling features we would have loved to have 30 years ago. I remember searching on a shelf for a missing tape while editing “The Johnny Cash Show” 40 years ago — because the metadata on a 3in by 5in card was illegible. That makes me appreciate the stunning improvements in recording technology.

John Luff is a television technology consultant.

Send questions and comments to: john.luff@penton.com