The latest blog post of the Google AI official blog introduces us to a new experimental system, Translatotron, which translates speech directly into speech without any text in the middle. In the blog post, “Translatotron is the first end-to-end model that can directly translate speech in one language into speech in another. ”
Google said the current translation system is divided into three steps: automatic speech recognition, converting speech to text; machine translation, converting text to another language; and finally text-to-speech (TTS) synthesis, which translates well. Text generates speech. In these three steps, services such as Google Translate have been spawned, but the tech giant hopes to translate the voice through a model without the intermediate step of text.
Ye Jia and Ron Weiss, Google AI software engineers, said: “The system is called Translatotron. This system avoids dividing tasks into different phases. & rdquo; Google said this means faster translation speed and fewer translation errors. The system uses a spectrogram as input and generates a spectrogram, again relying on a neurovocoder and a speaker encoder, which means that the system preserves the speaker's voice characteristics after translation.