Microsoft’s new AI tool, Vall-E’s AI Text To Speech system (TTS) can take a three second recording of a person and then convert written words into a speech in that person’s voice with high accuracy.
Vall-E is a ‘neural codec language model’, according to Microsoft. The TTS training data was measured to 60,000 hours of English speech, hundred times larger than other existing systems in the market, according to the company.
Vall-E does not require any specific data to be fed into the system, just a 3 second audio recording and a text prompt. The tool also has the ability to preserve the speaker’s emotions.
Microsoft has demonstrated the tools ability on the GitHub page. The 3 second audio can be recorded in any tone such as angry, sleepy, disgusted, etc and Vall-E will recite the text in the same tone.
Due to its accuracy, the tool will be beneficial to those who lost their voice or ability to speak.
However, the downside is that it can be misused for impersonation of speakers, spreading fake news and misinformation.
Ifunanya Ikueze is an Engineer, Safety Professional, Writer, Investor, Entrepreneur and Educator.