Machine learning and deep learning-based solutions are making a significant impact on media QC thanks to the availability of large GPU computing power and datasets. Using these technologies, media companies can automatically verify if audiovisual content meets compliance requirements. In regions of the world where nudity, adult content, violence, prohibited objects, substance abuse, and strong language are outlawed, media companies can leverage these technologies to increase compliance reliability and streamline their QC workflows, saving both time and money.
This article will explain why the success of a learning-based system depends heavily on the quality and quantity of datasets used. While publicly available datasets are good for general development of learning-based systems, they are not adequate for the specific requirements of content compliance in the media industry. Significant efforts are needed to build well-annotated quality datasets for the specific requirements of content compliance. If the training dataset is not well designed, then it is easy for an object detector to confuse guns with cell phones, for example.
ADVANCEMENTS IN ML/DL TECHNOLOGY
Content compliance can be a rather intricate process that involves analyzing metadata gathered from a variety of fundamental tasks, such as detecting objects inside frames, recognizing actions over several frames, classifying scenery, detecting specific events in audio or video tracks, classifying videos into specific activities or themes, converting speech to text, and detecting and recognizing faces.
In a traditional machine learning system, the features extracted from images for content compliance purposes were made by humans. Recent advancements in ML and DL have automated this process. A huge breakthrough in deep learning occurred in 2012 when AlexNet was designed. AlexNet is a convolutional neural network trained on 1.2 million real world images from a dataset called ImageNet for classification purposes. Images are classified into 1000 different categories, five layers and 60 million parameters, making AlexNet one of the most intricate and low-error-rate networks.
After AlexNet there were several additional developments in between the years of 2012 and 2015. Faster R-CNN, a deep neural network for object detection tasks, is one network that was proposed. While AlexNet addresses image classification, Faster R-CNN is designed to resolve object detection problems; therefore, it is more complex since it involves locating the object inside an image. Faster R-CNN recommends possible regions in an image that might contain an object and checks whether the proposed regions contain an object among the list of supported categories or not. If they do, the network returns the bounding box of the region containing the object and the name of object.
There are two key parts involved with constructing an ML network for QC. First, a QC solutions provider has to train the network on datasets so that the network can start recognizing objects of interest (e.g., guns, alcohol, cigarettes, belly buttons, etc.). Transfer learning is a technique that can be useful when training a network. Transfer learning reuses a trained model as a starting point for training on another dataset. This aids in training a network quickly for new types of objects and with less number of examples. Second, the trained network is applied in the media QC environment to make predictions about the presence of these objects in media files.
A critical factor of success for deep learning has been the availability of huge well labeled datasets. If datasets are well labeled, they can outperform the accuracy of human visual recognition. In fact, the top-performing dataset model achieved in accuracy of 96 percent in 2017.
THREE WAYS TO APPLY ML/DL TO MEDIA QC
ML/DL can be used for a range of different quality and compliance purposes in media workflows. Aside from detecting objects, the technology is useful for recognizing activity in a video frame, onscreen visual text, audio events, and whether captions are aligned correctly. Let’s look at three key ways that operators can use these techniques to their advantage.
Identifying explicit content is one area where ML/DL technology can be useful in the media environment. Object detection, activity recognition, audio and visual cues can be utilized to determine if there is nudity or minimal covering, mild sexual situations, or explicit sexual situations. Additionally, activity recognition and object detection can be used to identify violence, including the presence of guns, killing, and car crashes. In some regions of the world, the presence of alcohol and smoking in video content is prohibited. Operators can use object detection ML technology to identify alcohol as well as cigarettes, cigars and other vaping devices. Activity recognition plays a role in detecting the actual physical act of smoking.
Accuracy is crucial when it comes to quality control for media operations. If an operator misses a video scene with violence or alcohol, they run the risk of not adhering to content compliance requirements. Over the years, learning-based systems have evolved to where datasets are richer and higher in quality, improving the automatic identification of content for compliance purposes. With the latest innovations in ML/DL technology, operators can significantly increase the efficiency and accuracy of their media workflows.
Interra Systems’ software-based QC solution has been integrated with the latest advancements in ML and AI technology, allowing operators to deliver exceptional audio-video quality on every device and comply with all regional content guidelines.
Shailesh Kumar is Associate Director of Engineering at Interra Systems.