Deep Learning in the Media Supply Chain

(Image credit: iStock)


No other topic has dominated industry conversation in recent years like AI. But what exactly does it mean when we speak of AI? 

Artificial intelligence is the generic term for a machine simulation of human cognitive abilities. Machine Learning, in turn, describes a series of mathematical methods that can identify certain patterns in data from learned examples. Deep Learning is a subset of machine learning and uses artificial neural networks that enable the system to learn autonomously. 


Deep Learning enables the processing of amounts of data that is not practical to process manually. The strength of deep learning lies in capturing patterns and structures of different data types, as well as in tagging and enriching data. With its daily flow of current facts, figures and data, the media sector is ideal for the application of deep learning. 

Although many media professionals are skeptical about AI, recent studies find they would be comfortable with AI-generated news like traffic or weather reports. But without the right strategy, more automation can quickly become a nightmare.


How do we integrate our existing systems with the rapidly growing field of AI providers with pre-trained models, frameworks and environments ready to be used as services?

First, we need to look at where we might apply them. There are opportunities throughout the media supply chain. A few examples include: 

  • Ingest—Automatic QC, compliance, deep fake recognition, copyright monitoring  
  • Production—Tagging, entity recognition, topic clustering and (soon) rough cuts, automatic highlight cuts, robot journalism 
  • Planning—Automatic program planning, based on licensing or marketing patterns 
  • Marketing—Rating prediction, imitation of buying patterns 
  • Distribution—Automated playout or packaging


Deep Learning helps us to gain insights into media objects at a level that wasn’t practical without automation and helping us toward our vision of wanting to know "everything about every frame."

To support to the multitude of services available and bridge the data and organizational silos that segregate both content and business intelligence, we implement an “AI-specific” intelligence layer that manages all communication, but also adds value through:

  • Normalization—Bringing results into a unified format 
  • Cross-media analysis—Video, stills, audio, text 
  • Multicloud—Connect many different providers 
  • Training—Especially in the field of computer vision 
  • Knowledge graph—Build contextual data models from different data silos and query them in real time with dynamic requests  

Supporting a "best-of-breed" approach, users can choose the combination of services that best fit their requirements. This is realized through the normalization of different result schemes and making them available in a uniform metadata model. In this way, a uniform experience is achieved without neglecting the special knowledge or features of the individual services.

Applying a uniform metadata also has further advantages. Recognition concepts analyzed by different services can be merged, compared and interchanged. We can also combine services, for example, a speech-to-text transcript from one operation can be sent through Natural Language Processing in a “Cascade” operation.

A standardized metadata set and version tracking enable us to reproduce individual results ourselves and also determine where the data actually comes from and what predicted confidence was recognized. This enables users to rapidly optimize—for example changing threshold values—with results displayed immediately without having to re-analyze all media.

Organizations using deep learning need to train the algorithms with the data that is appropriate to their needs, and continuously train as those needs evolve—especially important in dynamic environments such as news where topics/people/objects of interest constantly change. Creating labeled training data is the “tagging” of tomorrow but being not the primary task in a creative process, the effort for this should be minimized. Since the training data is media objects, why not do this directly in the MAM with easy tools for media managers or journalists, integrated into daily tasks.


"AI" is an interdisciplinary team sport—from idea to validation by means of a prototype, up to the transfer into production, many different roles are required, including:  

  • Business analyst—The domain expert  
  • Data engineer—Provides data sources in sufficient quantity and quality 
  • Data scientist—Implements and verifies the algorithms

The 80/20 rule applies here with practical experience showing that data engineering often takes up the majority of the work, whereas implementation accounts for a smaller part.

With roles defined, it is recommended to take a standardized “go live” process as follows:

  • AI Roadmap—Identify & prioritize relevant use cases  
  • AI Lab—From idea to a verified prototype within a few days 
  • AI Factory—Develop operational AI service fully integrated to the production environment 
  • AI Operation—A stable and permanent operation and ongoing improvement 

As a global leader in the world of IT, AI has been an important topic within Vidispine and the Arvato Systems group as a whole. We have fostered professional and creative exchange in the Arvato Systems AI Competence Cluster (opens in new tab)—a network of interdisciplinary colleagues aiming at transferring knowledge and driving innovation. To this end, many interesting examples from other businesses are coming to the forefront, such as interactive fashion recognition, extraction of manuscript insights, anomaly detection of infrastructure, data journalism (e.g. through a “crime map”), to name a few. It is clear that in the future, AI will simply be a part of every IT toolbox.

Ralf Jansen is software architect and product manager, Vidispine, an Arvato Systems brand