Why we invested in Speechmatics

by Nadine Torbey

Why we invested in Speechmatics …

With the rise of automation, machine learning and AI, humans are continuously communicating with machines. Considering speech is the primary method of communication between people (at least until telepathic technologies are invented…), it is only natural that speech will eventually become the main channel of human-to-machine communication, making automated speech recognition (ASR) technologies a key fundamental building block of that progress.

Which is why we are excited to announce AlbionVC has just led a £6.35m Series A investment into Speechmatics. Speechmatics has built one of the most accurate speech recognition engines in the world with the ability to deploy its language models on the cloud, on premise and on-device, and process speech in 29 different languages. It competes head to head with cloud ASR providers and is currently the leader in on-prem and on-device deployment. 

While ASRs have existed for some time now in many forms, it is only recently that the market has started maturing as the level of accuracy (hovering around 90%) and speed is now sufficient enough to encourage wider market adoption. This means that the market is getting noisier, in commoditisation phase almost. However, a few trends are emerging that indicate decentralisation from the cloud to on-premise and on-device deployment, depicting a more nuanced landscape.   

Importance of Privacy

Most speech recognition is currently processed on the cloud (the main players being large tech providers like Google, Amazon, Microsoft, IBM competing for market share). However, businesses are becoming increasingly conscious of privacy and data security, especially when handling sensitive customer data or operating in regulated industries. Businesses that use transcription in the legal or healthcare industries, or have contact centres agents, may be reluctant to send their data to the cloud – least of which to the likes of tech giants. This creates a massive opportunity for Speechmatics.  

Small-footprint ASR Deployed on Device

The rise of IoT and increased role of devices in everyday lives has implied a need for better and more efficient human-machine interaction.

Aside from security, two key elements are slowing this progress:

  • latency
  • unreliable network availability

It is important for devices to be able to process speech on-device rather than going through the cloud, which can only be done by having ASR engines with small size language models deployed on the devices themselves.

Speechmatics is currently the only company with this capability, although other companies are starting to recognise the strategic importance of developing this capability. Google announced it is developing its own low footprint ASR engine targeting 80MB in model size, and Amazon is also working on an engine targeted for vehicles.

To become front runner in the field is a testament to the vision that the Speechmatics team has. It also sheds light on certain advantages that come from competing with tech giants:

  • A large proportion of the market does not wish to be dependent on any one of them within its core technology stack
  • Other big technology players needed to catch up in a strategic area.

Voice Biometrics

Speech detect is maturing from ‘wake words’ (yes! your relationship with Siri will evolve far beyond saying hello…) to processing conversational speech. On top of processing long form speech (where Speechmatics is currently leader) developing the technology to understand emotions ,or to detect if someone is getting sick are other examples of exciting prospects that could emerge in the future market.

In what appears at first glance to be a crowded and tough market to crack, Speechmatics has managed to carve out a very attractive position, demonstrating not only technological excellence but also true vision on where the market is heading.