ADVERTISEMENT

Amazon AI Used to Create Synthesized Voices

Published: 2019-12-30

We’ve heard about AI creating art, having emotions, tending bar, and accomplishing much more than we thought possible only a decade ago. When you hear electronic music today, you might think that AI is in music, too… but few models have actually been able to synthesize singing or clone a voice. But researchers from Amazon and Cambridge have recently collaborated to create synthesized singers.

Their model uses WaveNet, a Google algorithm, to synthesize mel-spectograms of sounds

More from a recent Venture Beat article:

The system comprises three parts, the first of which is a frontend that takes a musical score as input and produces note embeddings (i.e., numerical representations of notes) to be sent to an encoder.

The second is a model that is modified to accept the aforementioned embeddings, whose decoder produces mel-specrograms. As for the third and final component — the WaveNet vocoder, which mimics things like stress and intonation in speech — it synthesizes the spectrograms into song.

FEATURED REPORT

The frontend performs linguistic analysis on the score lyrics, allowing for three possible vowel levels of stress and ignoring punctuation. In time, it discovers which phonemes (perceptually distinct units of sound) correspond to each note of the score using syllabification information specified in the score itself.

It also computes the expected duration in seconds of each note, as well as the tempo and time signature of the score, which it combines into embeddings.

The WaveNet model was trained with over 40 hours of real musical a capella recordings covering multiple genres.

The research also included an audience: 22 human listeners evaluated the synthesized music by listening to short segments and providing a rating for “naturalness.”

The model got an average rating of nearly 60%.

Read Next: What AV Industry Expert Panel Says about 5G, Analytics, & AI

While it was able to sing in tune most of the time, the Amazon AI synthesized singers performed best on simple songs without very high or low notes. It also apparently mastered vibrato and the appropriate places to apply it.

Posted in: Insights

Tagged with:

ADVERTISEMENT
ADVERTISEMENT
B2B Marketing Exchange
B2B Marketing Exchange East