SMPTE 2017: Q&A—Tom Ohanian, Vitec Group

Shortly before the start of SMPTE 2017 Technical Conference & Exhibition, TV Technology spoke with Tom Ohanian, vice president of product development and marketing at Vitec Group, about his Thursday session “How Artificial Intelligence and Machine Learning Will Change Content Creation Methodologies.”

TV TECHNOLOGY: Is there a specific type of content you see being edited without the aesthetic input of a person? Are there types of content that you think will benefit most by this type of AI editing?

TOM OHANIAN: Most logically, content types that lend themselves to templates will benefit from the automated methodologies of AI. For example, take television promos. A primetime network show may have anywhere between 200-2,000 promos for each episode. I know it sounds like a ridiculously high number, but when you factor in the permutations for language, length and distributor, the numbers climb. Once a promo template has been initially created by a human, the variants can be easily addressed and assembled without further human intervention.

TVT: What would you say are the primary advantages to AI editing over human editors?

OHANIAN: This is always the gut-wrenching, face-tightening, teeth-clenching question. But let’s put those emotions aside. It is natural to think of AI as a threat to human editors, but that is not my point of view. We have to carefully consider that there are consumption outlets that require content, but those outlets do not have the same ad model or revenue support as primetime television programming. 

AI editing can also lend assistance to the human editor. Editing constructs and the rules of cinematography can provide us with rulesets for AI systems to utilize. For example, given a scene of master shot, over the shoulders and close-ups, AI can assemble variants of the scene that the human editor can then finesse.

Why is this necessary? Well, it has to do with the fact that not all content is equal—and therefore, we have to find more streamlined and automated ways of either pre-assembling or assembling that content. 

TVT: Is there work being done this way now? If so, what kind of work is it?

OHANIAN: AI is already being used to reduce human transcriptionists. We’ve seen great success in that area. We’ve also seen more frequent examples of movie trailers being created via AI. We are very quickly moving to a point where pre-assemblies and work types that can be templates will be done in this fashion.

TVT: Of the points mentioned – image recognition, natural speech processing, language recognition, cognitive metadata extraction, tonal analysis and interactive game engine technologies for creation of content – which is furthest along today and which is in the most rudimentary stages?

OHANIAN: Image and natural speech have progressed enormously in the last two years—we’ve experienced this in so many facets of our everyday lives. Making educated and learned extrapolations is just beginning, but will be enormously powerful.

TVT: Machine learning presumably continues to receive input, whether through developers or crowd sourcing or something similar, to correct mistakes and “teach” the system how to improve. If AI editing is being used for content creation on a professional level, what sorts of information would such a system use to continue to improve? And also, what would count as a “mistake” or an “improvement” in what is ultimately a creative field?

OHANIAN: You’re absolutely correct. Vocoders depend upon large libraries. And they get better with having more data. What can AI and ML routines learn from the human editor who tweaked and adjusted an AI-created pre-assemble? A lot. How does the human editor choose who to emphasize and showcase in a two-person dialogue scene? Isn’t it based on the script, the kernel of the scene, the dialogue and the meaning of the dialogue, and, of course, the looks and beats? These are among the many things that the human editor considers, and the why and the how of those decisions can be taught.