Text-to-speech from Google Cloud (Text-to-Speech) and voice to text (Speech-to-TextThe two APIs introduce support for more languages in the heavy updates that are available today, making it easier to hear automatically generated sounds from different speakers and promise to provide improved tools for speech recognition to provide better conversions. . In this round of updates, the Cloud Text-to-Speech API is also officially available to users.
For many, the most important thing in this round of updates is the release of 17 new WaveNet-based voices. WaveNet is Google's technology for creating text-to-speech audio files using machine learning. A more natural sound experience can be achieved after the upgrade. In this round of updates, 14 new languages and related variants have been added to the Text-to-Speech API, providing a total of 30 standard voices and 26 WaveNet voices.
In terms of voice-to-text, Google makes it easier for developers to transcribe samples from multiple speakers. Using machine learning, the service recognizes speech on multiple different speakers (although it still needs to tell the machine how many speaker samples there will be) and then number the speakers. The same new version supports multiple languages, developers can select up to four languages, and the Voice-to-Text API automatically recognizes which language the current device uses.