Live TranscribeIt is an Android app launched by Google in February this year. Its speech recognition is provided by Google's most advanced Cloud Speech API. However, relying on the cloud introduces some complexity, and the ever-changing network connectivity, data cost, and latency robustness all bring some challenges. Therefore, Google has open sourced it and hopes that developers will build and develop on the basis of existing ones.
The Cloud Speech API currently does not support unlimited audio streaming, and the team is currently taking steps to address this dilemma, such as shutting down and restarting streaming requests before the timeout is reached, which effectively reduces the amount of text lost in the session.
Unlimited streaming audio brings a big challenge. In many countries, network data is very expensive, and where the Internet is poor, bandwidth may be limited. The team at Live Transcribe Speech Engine did a lot of experimentation with audio codecs and eventually reduced data usage by a factor of 10 without compromising accuracy.
In addition, since real-time voice transcription is provided, the transcribed text will change with the input of the voice, and it is necessary to reduce the delay. The engine can greatly reduce the latency, thanks to its custom Opus encoder.
In addition, it is worth mentioning that Live Transcribe supports more than 70 languages and automatically recognizes languages based on voice, including Chinese.