Riffusion’s AI generates music from text using visual sonograms

Trending 11 months ago
An AI-generated representation of philharmonic notes exploding distant from a machine monitor.

Enlarge / An AI-generated representation of philharmonic notes exploding distant from a machine monitor. (credit: Ars Technica)

On Thursday, a brace of tech hobbyists released Riffusion, an AI exemplary that generates euphony from substance prompts by creating a ocular practice of dependable and converting it to audio for playback. It uses a fine-tuned mentation of the Stable Diffusion 1.5 representation synthesis model, applying ocular latent diffusion to dependable processing successful a caller way.

Created arsenic a hobby task by Seth Forsgren and Hayk Martiros, Riffusion works by generating sonograms, which store audio successful a two-dimensional image. In a sonogram, the X-axis represents clip (the bid successful which the frequencies get played, from near to right), and the Y-axis represents the frequence of the sounds. Meanwhile, the colour of each pixel successful the representation represents the amplitude of the dependable astatine that fixed infinitesimal successful time.

Since a sonogram is simply a benignant of picture, Stable Diffusion tin process it. Forsgren and Martiros trained a customized Stable Diffusion exemplary with illustration sonograms linked to descriptions of the sounds oregon philharmonic genres they represented. With that knowledge, Riffusion tin make caller euphony connected the alert based connected substance prompts that picture the benignant of euphony oregon dependable you privation to hear, specified arsenic "jazz," "rock," oregon adjacent typing connected a keyboard.

Read 7 remaining paragraphs | Comments

More
Source Arstechnica
Arstechnica
Top