The only way of discovering the limits of the possible is to venture a little way past them into the impossible. — Arthur C. Clarke
Sometimes the news is so big, a guy has to stop what he’s doing and write an article for his favorite technology magazine. Such an event happened the other day with Intel’s announcement of the Neural Compute Stick 2. It’s a full capability artificial intelligence and vision processing device in a USB stick. I’m a pragmatic guy who is always thinking about what will be in the future, but with this announcement, the future’s arriving a little sooner than I had planned. It’s time to think about the impact on the broadcast industry and what we can prototype, RIGHT NOW.
To be sure, AI of this horsepower, about 4 teraflops/second (which I am guessing is smarter than a mosquito, but not as intelligent as a flounder), has been with us for a while and has been available through the cloud (think Amazon Alexa or Siri). This isn’t even Intel’s first neural network stick. But the addition of vision processing with this level of capability and without an internet connection means it can be used in real-time applications. And at just $99 each, every device will soon have the capabilities of being “smart.”
So what will this change? The short answer is ... just about everything. But readers of this article want to know how this will affect TV production and consumption. So here are some predictions:
Smart Brilliant TVs Will Know Us.
It won’t take long for the CES guys to pick up on this. Finding the right content to watch is hard and getting harder. I’ve observed for a while that trying to find a good film to watch across your cable, Netflix and HBO subscriptions is futile. The next step is for TVs to know exactly who is in the room and to recommend choices for us. Perhaps the television has a personality and participates in the debate. The set will get real-time feedback on the choices from all of us based upon our body language — improving the accuracy of recommendations over time. Whoever owns this algorithm is going to have a lot of influence. AI in the set can put this power back in the CE manufacturers’ laps.
Cameras Will See Things We Don’t
All the way on the other end of the signal path, cameras will get a lot smarter. Remember those old analog “skin tone color correction” circuits to smooth out wrinkles? Well, of course they went digital years ago, but what happens when they’re smart and driven by a neural network? Live de-aging correction is possible. Barbara Walters would be soooo jealous. It does not take much imagination to see that live replacement of an actor’s image with an avatar would open new doors for the effects industry to work their magic in live, more interactive ways.
Without a doubt, autofocus will now be able to anticipate the subject. Pan, tilt and zoom controls will follow the action and frame automatically, perhaps even mimicking a particular cinematographer’s style. It’s not a leap to say that robotic cameras will soon become self-driving. Ross, Vitec? Are you paying attention?
There’s no doubt these cost-effective, local, massively parallel processors will be used to create new compression algorithms. With computer vision, we can dedicate bits to only the most important parts of the image and create an architecture totally different than what we use now. I wonder who will do it first, how much latency there will be, and whether we’ll still owe royalties to MPEG-LA.
A Helpful Hand In Editing
Editing is akin to storytelling and therefore a creative art — but AI can help by organizing clips automatically based upon who is in them and create automated logs. The user interface may change to support gestures and voice control (no more carpal tunnel or backaches!). Perhaps the editing device itself might even learn the style of the editor and begin a rough-cut on their behalf.
Real-Time Automated Captioning
Silicon-based captioning solutions have been making steady progress in reaching the accuracy levels of their carbon-based competitors for prerecorded files. With localized AI, real-time live closed-captioning with acceptable accuracy should be a slam dunk. While we’re at it, let’s use the video recognition to automate VDS (Video Descriptive Service). Heck, all of this could move into the set-top box or even the TV for use on demand, and spare the broadcaster the cost of doing it at all. Let’s go one step further and add real-time translation into any language, executed in the TV, at the request of the viewer.
I’m Sorry, Dave, I’m Afraid I Can’t Do That
This exciting news is especially fitting on the week when Douglas Rain passed away. For those who did not catch the news, Rain played the voice of HAL 9000, the sinister AI computer in “2001 a Space Odyssey,” as well as a much more helpful sidekick in “2010: The Year We Make Contact.” So in one lifetime, we’ve gone from imagining pervasive artificial intelligence to making it available for under $100. Arthur C. Clarke would sum it up nicely, “Any sufficiently advanced technology is indistinguishable from magic.”