New ISO standard steps towards automatic metadata

The dream of automatic metadata generation from AudioVisual (AV) content has come a step closer with ratification by ISO (International Standardization Organization) of a standard of European origin called MPEG-7 AudioVisual Description Profile (AVDP).

To create metadata automatically requires the ability to extract information both from audio via speech-to-text transcription engines, and from the video by videoshot detection algorithms capable of identifying features and faces within the footage. AVDP sets out to do this, and "marks a significant step towards making metadata cheaper through the processing of audio and video with minimal human intervention," according to the European Broadcast Union (EBU)’s metadata expert Jean-Pierre Evain.

AVDP was conceived in the EBU MIM/SCAIE project in 2010, with active support and promotion from broadcasters including RAI of Italy and VRT from Belgium, along with Austrian research institution JOANNEUM Research. The ISO standardization effort was then supported by Japanese broadcaster NHK and AT&T in the U.S.

AVDP is part of the overall movement of broadcasting towards a file based IT framework that will generally streamline workflows with greater levels of automation. It fits in with the Framework for Interoperable Media Services (FIMS), in which the EBU is also heavily involved.

FIMS was first publicly demonstrated at this year’s NAB, representing a significant step towards IP file based workflows in broadcast, production, post production and archiving. FIMS is a joint project between the EBU, and the Advanced Media Workflow Association, which was established to develop standards for broadcasting as it moves close to the world of IT, with content stored as files transmitted around IP networks.

FIMS itself was set up to help broadcasters migrate from traditional video centric technologies, involving tape storage for example, towards IT-based ones more like enterprise data centers. Many broadcasters are finding it a struggle to design and manage data centers, with a major hurdle being the lack of standard interfaces between components and systems across the production and archiving chain. This is a serious handicap as broadcasting is becoming increasingly multivendor in the era of multi-screen services, with more global distribution of content to many different platforms.

The upshot is that broadcasters are having to invest in expensive system integration to develop custom adapters in order for components from different vendors to interoperate. This, in turn, generates scalability and maintenance problems as the substitution or upgrade of one component can require further adaptation expenses.

The EBU had become convinced that the solution lay in adopting the Service Oriented Architecture (SOA) that evolved in the enterprise IT and Internet world as a framework for interoperable services and components, with applications running in a more flexible loosely coupled environment. There is mounting evidence that SOA has the potential for greatly improved interoperability at lower cost, compared with current system design practices based on proprietary interfaces, whether in broadcasting or any other sphere.

Meanwhile, the AMWA came to similar conclusions and set up its Media Services Architecture Group (MSAG) with much the same objectives as the EBU. So, rather than duplicating effort, the EBU and AMWA came together to develop FIMS as an SOA-based framework designed for the broadcasting industry. With a range of industry partners, the project is building a vendor-neutral common framework that will enable equipment and software from different manufacturers to work together.
FIMS is borrowing from the IT industry, with partners including IBM, which until now has not been a major player in broadcasting other than as a provider of computing and storage capacity.

At the same time, though, FIMS is being built on the realization that broadcasting has unique requirements because video is like no other form of data. This is not so much about the huge volumes involved, but the fact video is difficult to catalogue and analyze, as is becoming increasingly important for search, recommendation and navigation, as well as for workflow management during production, post production and archiving. For this reason an important part of the FIMS project lies in developing automated tools for creating and manipulating metadata, as well as for media asset retrieval and updating. This is where AVDP comes in.

FIMS also addresses aspects of OTT and multi-screen delivery, such as resource estimation for reservation, and IP stream capture, where the BBC has made important contributions based on its experience with its iPlayer catch up service. FIMS, then, is about much more than basic interoperability, but also the processes that underpin multi-screen services delivery and management.