Adobe and Speechmatics Deliver `Cloud-Grade’ On-Device Speech Recognition for Premiere

Adobe
(Image credit: Adobe)

Adobe has expanded its work with Speechmatics to deliver cloud-grade speech recognition on-device for its Premiere editing software that the company says can run accurate on-device transcription locally and is powerful enough for professional work.

Speechmatics has been Adobe's partner since 2021, when Adobe became the first non-linear editing platform to include speech-to-text (STT) in Premiere. The new feature deepens with a new on-device STT model in Premiere that delivers near-cloud accuracy while keeping all audio local to the device.

When Adobe launched STT for Premiere, large enterprises couldn't always use cloud-based services due to privacy concerns. Speechmatics was one of the few providers with on-device models—a key reason for the partnership.

Five years later, those privacy requirements haven't changed. With the rise of LLMs and data sovereignty concerns, the need for secure deployments has, in fact, increased. What has changed is the performance gap: Speechmatics' new on-device model brings local transcription on par with cloud accuracy with optimisations to run efficiently.

This means, Adobe explained, that studios, agencies, and production companies handling content before it goes public can now work seamlessly from anywhere: on a film set, between client meetings, on a flight—at full accuracy, with no dependency on a connection and no interruption to the work.

The new Speechmatics on-device model has been trained on millions of hours of speech to deliver high accuracy for accented speech, non-native speakers, and noisy environments like field reporting or film sets. As a result, Adobe said that the new on-device model in Premiere:

  • Is within 5% relative to cloud accuracy, evaluated across nearly 10 million words of diverse real-world data
  • Processes 1 hour of audio in about 55 seconds
  • Leads the way against the closest competitor, with a 12-16% improvement against Whisper-powered creative solutions
  • Runs on Windows & Mac, making use of the latest AI acceleration techniques to ensure efficient processing across a range of hardware, including broad hardware support for the latest Mac M5, NVIDIA RTX, AMD GPUs and older hardware such as Intel Macs.

Speechmatics on-device joins Speechmatics cloud and Speechmatics on-prem as a purpose-built option for ISVs and OEMs where data residency, offline capability, or predictable costs make local execution the right architectural call. It integrates as a C/C++ library on macOS and Windows.

CATEGORIES

George Winslow is the senior content producer for TV Tech. He has written about the television, media and technology industries for nearly 30 years for such publications as Broadcasting & Cable, Multichannel News and TV Tech. Over the years, he has edited a number of magazines, including Multichannel News International and World Screen, and moderated panels at such major industry events as NAB and MIP TV. He has published two books and dozens of encyclopedia articles on such subjects as the media, New York City history and economics.