Nvidia has pulled off a trick unheard of! The computer chip giant has added to their artificial intelligence contributions. It has developed a new generative AI audio model that can “produce sounds never heard before.” Named Fugatto, the model can synthesize “high-quality singing voices” from text inputs. Nvidia quotes its new technology as “a Swiss Army knife for sound”.
Fugatto stands for Foundational Generative Audio Transformer Opus 1. Nvidia claims it is able to generate, transform and manipulate sound using text and audio prompts. This creates sounds like a trumpet barking or a saxophone meowing. The main capabilities of Fugatto is inclusive of creating music snippets from text prompts. It can also modify existing songs by adding or removing instruments, changing voice characteristics like accent and emotion, and generating entirely novel sounds.
Nvidia exhibited the capabilities of Fugatto in a video, highlighting how users can generate sounds through prompts like: “Create a sound where a train passes by and becomes a lush string orchestra.” Fugatto also allows users to isolate voices from songs, among other features, as per the video. Fugatto uses ComposableART, a method that enables users to mix instructions not originally seen together during training. This indiactes users can request complex audio transformations, such as ‘text spoken with a sad feeling in a French accent’, as clarified by Nvidia.
Nvidia also quoted that its research team from India, Brazil, China, Jordan and South Korea spent over a year developing a dataset containing millions of audio samples to develop Fugatto. It can be applied in multiple industries including music production, advertising, language learning and video game development, as per Nvidia.