Managing Legacy Data

Dealing with unstructured data is a challenge organizations have been forced to address regardless of whether it consists of documents (PDF, Word, Excel files), photographic images, or video and/or audio as files.

Beginning with physical media, such as videotape, this data continually accumulates at an alarming rate. And there never seems to be sufficient time or money to deal with the problem at any level.

Legacy data for news, entertainment and production, unless managed by some form of asset management system such as DAM, MAM or PAM (see Fig. 1), is almost always found as uncategorized, unstructured and seldom accessed.


Early on, a station "film librarian," a position that has all but vanished since the introduction of videotape and now files, took care of the media assets. As 16mm film migrated to two-inch tape, the librarian's job became more complicated. As tape formats moved to cassettes, asset management in any form became secondary. Tapes went to drawers, shelves, desks, vans, and occasionally to the tape room.

A flowchart for managing legacy data
Today, with solid-state acquisition media, that dilemma has changed dimensions again.

When file-based data has no organization or cataloging—only its file name—it is classified as "unstructured." Estimates for the amount of unstructured data in terms of storage, range from 35 percent to as much as 80 percent of all the data accounted for in the enterprise

Historically, the majority of this data is never used. No one or no system knows where to find it. It remains uncataloged, uncategorized and essentially worthless.

This discussion has continued for more than a decade. It is fair to say that "[t]he management of unstructured data is recognized as one of the major unsolved problems in the information technology (IT) industry…" (DM Review, February 2003). Perceptions remain unchanged, but are now being addressed in earnest.

With the video media world, content comes from every dimension. If this accumulation continues unchecked, legacy data will become unmanageable for most organizations in surprisingly short order.

Legacy data compounds continually. Most organizations face similar problems; when one data repository fills up, more storage is added. When the facility runs out of space, power or cooling to keep its data on spinning disk, they move to robotic tape libraries. When the library's slots are filled, the tapes move to shelves. When they're full, the tapes are pushed to Iron Mountain and held in perpetuity. This is not management; it is simply delaying the inevitable.

Dealing with legacy data begins before the data becomes "legacy." As content is created, ingested or acquired, it must be categorized. One method is through a media asset management system (MAM or AMS).

The digital imaging and document management segments of health care and insurance protection have established and well-entrenched solutions already in place. These tools aid in keeping current through legacy content (email, documents, media files, scanned images) under control.

For television production, broadcast, news and information there are solutions available, but historically tended to be single-vendor specific products. This too is changing, with workflow managers, third party plug-ins, etc.

Handling media assets through metadata depends upon workflow. File-based image capturing on some vendor's equipment, such as camcorders, has done a relatively decent job of capturing fundamental sets of metadata and making it available during the production process.


Nonetheless, the applications for camera-originated metadata become constrained as the media is moved from the source to the editing platform, (video) server, storage-platform or transfer agent.

Software tool sets for transposing metadata from the source to the edit application can be cumbersome and inconsistent across platforms. Interchange outside the native edit platform can be nearly impossible without third-party applications. Users seeing no immediate business value simply truncate the data before giving it a chance to be used at all.

Herein lies a problem with asset management systems. At an enterprise level, there may be dozens of ancillary uses for the media assets throughout the organization. However, these assets are only afforded value if they can be identified, cataloged, searched and retrieved when needed.

Once the enterprise acquires an AMS, it expects the users to simply adapt their workflow to the AMS. Always up against deadlines, the last thing a user wants are extra steps that force them to take longer to get their work out. Instead, shortcuts are taken and the value of the AMS is globally diminished.

Smart systems, employing media-aware components help reduce the manual interventions perceived by many in working with AMSs.

Logging tools, facial and character recognition, speech-to-text, and other applications are helping to deal with cataloging both current and legacy media sets.

We have only brushed the surface of how to deal with legacy data. For most, it may not be practical to catalog years of unstructured data; so the starting point users should take away is that legacy data management can be improved only if you start now.

Develop a workflow with a set of processes that alleviates the issues of legacy data going forward. The longer you wait, the more burdensome the process becomes. In other words, the problem doesn't go away; it just becomes different.

Karl Paulsen, CPBE, is a SMPTE Fellow, technologist and consultant to the digital media and entertainment industry with Diversified Systems. Contact him at

Karl Paulsen

Karl Paulsen is the CTO for Diversified, the global leader in media-related technologies, innovations and systems integration. Karl provides subject matter expertise and innovative visionary futures related to advanced networking and IP-technologies, workflow design and assessment, media asset management, and storage technologies. Karl is a SMPTE Life Fellow, a SBE Life Member & Certified Professional Broadcast Engineer, and the author of hundreds of articles focused on industry advances in cloud, storage, workflow, and media technologies. For over 25-years he has continually featured topics in TV Tech magazine—penning the magazine’s Storage and Media Technologies and its Cloudspotter’s Journal columns.