Archiving: Maintaining Relevance for the Future

What the Library of Congress can teach us about preserving digital media
Publish date:
Updated on

JOHNSTON, IOWA—Some 40,000 years ago, some primitive peoples painted pictograms of meaningful events on the walls of caves, mostly found in Europe and Asia. Here is a question for those of us who are documenting the stories about the meaningful events of our time. In 40,000 years hence, when we are viewed as the primitive people of the past, what will scholars and historians in those days be able to deduce from the stored 0s and 1s?

At the recent SMPTE conference in Hollywood, my colleagues Merrill Weiss, Brian Campanotti and James Snyder presented a session entitled “Standards for Archives and Production Workflows.” The fundamental talking point within the three presentations was the recently published SMPTE AXF standard and the ongoing work to further expand and refine that standard. And while Merrill’s and Brian’s presentations were interesting, informative and engaging, I really want to focus this article on the presentation that James Snyder from the Library of Congress did, called “Media Archiving, Standards and the Library of Congress.” James’ presentation really highlighted and brought home the challenge of archiving in the digital age.

Image placeholder title

Iowa Public Television archives its material within a SpectraLogic T-950 robotic LTO-based tape library. Now where to begin? I won’t go into all the specific details that James shared as an introduction to the history and function of the Library of Congress other than to say that it is the world’s largest archive and research institution with over 155 million items and growing. Its audio-visual collection is staggering, with the copyright depository of the motion picture, broadcasting and recorded sound industries. More than 1.4 million moving image recordings and more than 3.5 million audio recordings, both in almost every format known are maintained in its archives. They also have somewhere around 255 million feet of film, paper records and an ever-growing collection of content that has started its life as a digital element. And more stuff is constantly coming in.

Even if the mission was just to digitize everything, it would be a Herculean task, but the collection also has to be usable. So in addition to digitizing this enormous collection using well-documented standards for video and audio, the LoC has to maintain the metadata so that all these items can be used in the present and actively preserved into the future.

The Library of Congress’ sustainability plan for these items is a minimum retention period of 150 years. Now while this is certainly not the 40,000 years that the early pictograms have survived, there is a huge difference. Most notably, the pictograms don’t need to be decoded to be read. As the old adage goes, “a picture is worth a thousand words” and this is especially true when there are no native speakers of the original language. The human brain essentially looks at the pictogram or sequence of pictograms and can deduce with varying levels of correctness what the message is. But in our digital age, the pictures and the sounds become sequences of 0’s and 1’s that must be properly decoded to even be viewed. It presents quite a challenge.

In high school, some of my friends and I programmed a two-dimensional space war game where we portrayed the universe as a huge x/y matrix and the solar systems were smaller sub-matrices of the larger. The program was written in Basic A and I still have it stored on the original paper tape that is now a little more than 40 years old. Now before you roll your eyes, I have found a USB 3.0 paper tape reader so if I really wanted to pull out the old code, it is quite doable.

My point is that the storage medium is one of the critical elements in archiving, but is not permanent, so for the archive to be useful I have to migrate the archive based on the stability and usability of the storage medium or its usefulness and value is diminished. So any digital archive has to have a migration plan that allows for incremental and fundamental changes in the physical storage medium.

Here at IPTV our archival material is stored within our SpectraLogic T-950 robotic tape library, which is an LTO-based medium. When we first ordered the library as part of a larger project, the unit was sized to eventually accommodate all of our existing historical content based on LTO-1 tapes, with the potential for adding an expansion frame.

Image placeholder title

The Library of Congress However, due to delays in the original project, by the time our system was delivered, LTO-2 was the current version so the capacity of the library doubled based on the change to the medium. Since the original installation was based on the LTO roadmap and the practice that each new generation is fully compatible with the previous generation and read capable two generations back—we instituted a plan to replace the LTO tape drives and tapes every second generation. We are currently on LTO-6 and planning for LTO-8. This plan allows us to migrate our data in an automated fashion and increase the capacity of the library based on the capacity increases of the medium.

Current LTO-6 capacity is 6.25 TB per tape and generation 8 is estimated somewhere in the 32 TB range. The LTO roadmap currently extends to generation 10 and it is unclear if there will be generations beyond that. History would seem to imply that at some point the technology curve will flatten out for LTO and the cost benefit ratio will make further generations make less economic sense. However my guess is that before that happens, a new technology will lead to a seachange. If you have an interest in long-term archival storage, I would recommend researching DNA digital data storage, which promises exceptionally high capacity and longevity along the lines of 60 million years or more. It is as future-proof as I can conceive, since DNA is the data storage mechanism of living biology.

Notice that I mentioned that the medium is “one of the critical elements.” It is not the only one and ultimately may be the easiest to deal with, as data migration is merely the transfer of bits and bytes from one medium (i.e., paper tape) to another medium (i.e., LTO-10). But what do those bits and bytes mean and how are they to be decoded and reassembled into meaningful, worthwhile and understandable information? I’ll address that in my next column.

Bill Hayes is the director of engineering for Iowa Public Television. He can be reached via TV Technology.