Transforming Media Asset Management with Artificial Intelligence

In today’s fast-paced media environments, more new content is being created than production teams can possibly manage without specialized tools. At the same time, the clock is ticking for digitizing historical content that exists in legacy, analog formats like tape before the original content degrades. It’s critical that all of these assets be logged and tagged so that they can be found easily, but teams have no time to do this essential work.

In addition, the current generation of media asset management tools has evolved in an environment where they have been starved of metadata. As a result, content teams’ options are limited to pulling technical metadata from media files or streams, extracting meaning from file and folder names, or manual logging.


Artificial intelligence is beginning to change how media organizations meet these challenges. A new and emerging breed of AI platforms for media analysis, when paired with leading-edge media asset management tools, offers great potential for transforming media workflows and making it easier than ever for operations to access, manage, and archive tremendous volumes of content. Through powerful tools such as speech-to-text and automatic language translation, AI engines bring new power to the MAM task of logging and tagging content—with the ability to tag assets automatically based on attributes such as people, places, things, and even sentiment.

But hold on: a few caveats

It sounds almost too good to be true: suddenly you can unlock the potential of all of your content and make it immediately searchable, reusable, and monetizable. At last, you can get some traction on those digitization projects and get a better handle on all of the content in your existing library! But wait—while the potential exists to realize these benefits someday, the truth is that the technology needs to overcome some issues in order to become mainstream.

One area that needs improvement is accuracy. While AI analysis is getting better all the time, particularly with speech-to-text offerings from players such as Google, Microsoft, Amazon, and IBM, fine-tuning is still needed. For instance, the engine might not be able to distinguish between U.K. or American English, and abbreviations and jargon are likely to generate mistakes. The industry is still working on easy methods to train the AI engine to recognize these language variations and correct mistakes. Also, for image or video analysis, the sophistication of AI tools varies considerably. Some platforms offer only very basic video analysis, meaning the best way to capture metadata for people, places, objects, and sentiments is to make a set of image sequences and analyse them manually.

AI aggregators can help users avoid some of the costs and complexities of setup by making it easier to choose the right AI engine for a specific task. But even so, picking the AI tool that’s best for a given activity is not trivial. At the same time, cost structures across the industry are far from transparent, making it difficult to work out the total expense of applying AI to a media library. It’s a multi-step process: first, you have to figure out how to get your content into the AI engine—which is often in the cloud. That might involve having to create a video proxy, separate the audio files, create an image sequence, and other steps, and then uploading the content and managing its lifecycle. Should you leave the content on the vendor platform or delete it to save on storage? Is it in the right format for the AI engines to understand? Which AI tool should you run, and is there a separate cost for each style of analysis? There might be different price tiers for different content formats; for instance, 4K assets might cost more. With each vendor having its own price list, it’s pretty difficult to compare apples to apples.

Also, the technology is advancing so quickly that any AI analysis done today may have to be refreshed later, as the tools improve. Managing these refreshed data sets, especially if they have been corrected or updated by a human after the original analysis, adds another layer of complexity. And of course security is a concern, especially if the data is uploaded to cloud providers.


As these powerful AI technologies continue to mature, strong media asset management capabilities will become increasingly important. On the metadata side, tools that can store, search, and easily correct a huge volume of time-based metadata are crucial. Good metadata and user interface design are vital to keep the system from overloading users with too much information. And on the workflow and automation side, feeding the AI engines with the right data and automating the analysis, while keeping down costs, will separate the true enterprise offerings from the also-rans.

So what might an AI-powered MAM solution look like? One approach is to supercharge the MAM system’s logging, tagging, and search functions through integrations with leading AI vendors and aggregators, such as Google, Microsoft, Amazon, and IBM. Integrations with best-of-breed AI platforms and cognitive engines could allow the MAM to leverage advanced AI-based speech recognition and video/image analysis, with the flexibility to be deployed either in the cloud or in hybrid on-premises/cloud environments.

Here are a few of the advanced capabilities that could result:

  • Speech-to-text, to automatically create transcripts and time-based metadata
  • Language translation
  • Place analysis, including identification of buildings and locations without using GPS-tagged shots
  • Object and scene detection (e.g. daytime shots or shots of specific animals)
  • Sentiment analysis, for finding and retrieving all content that expresses a certain emotion or sentiment (e.g. “find me celebrations (in a sports event)”)
  • Logo detection, to identify when certain brands appear in shots
  • Text recognition, to enable text to be extracted from characters in video
  • People recognition, for identifying people, including executives and celebrities


Of course, these capabilities are just the start. The MAM system can also be a powerful tool to train and improve AI engines; e.g. content manually tagged in the MAM could perhaps be used to identify the executives in a corporation. The MAM could use this manual tagging to train AI engines to do a better job of logging and tagging new content.

The industry is being transformed by AI and the explosion in sometimes low-quality metadata. Only the most powerful, flexible, easy-to-integrate, secure, and scalable MAM platforms are embracing this challenge and will thrive.

In the right hands, AI becomes the key that unlocks the next generation of MAM technologies.

Dave Clack is CEO of Square Box Systems, makers of the CatDV media asset management solution.