Stable Audio Open: Custom Soundscapes and Sound Design Locally

Stable Audio: Abstact sculpture of audio, vinyls instalation, stable diffusion image by Daniel Sandner

'Stable Audio Open' is a Stable Diffusion model specializing in audio samples, sound effects, and short music elements. It's important to note that Stable Audio Open 1.0 is not designed for full-scale music creation, although it can effectively generate shorter sound clips in various styles, such as rhythmic or drum loops. While it may not surpass the best models currently available, with proper setup and configuration, Stable Audio Open can deliver impressive results.

Installation for ComfyUI

For installation you will need both files, the model itself and t5 text encoder (for ComfyUI Instalaltion and tips read this article):

Use and Prompting

Stable Audio Open workflows in ComfyUI for sound design, soundscapes, and effects creation
Latent audio offers open possibilities for experimenting with sound design and effects in ComfyUI.

The control is the same like when you generate Stable Diffusion images. Expreriment with various styles of prompting. Stability.ai recommends a structured prompt (For Stability Audio 2.0), however mileage may vary. 

Stable Audio Open Test: Sound Design and Soundscapes

Tips

  • Use preview for waveforms, see examples. Use Manager for the comfyui-audio-processing nodes.
  • Experiment with sampler/scheduler combinations. 
  • Adjust token weight in similar manner as in SD image generation
  • Adjust CFG/steps/seed to get the best results
  • If you can not get the result from a single prompt, mix afterwards
  • SAO generates shorter (up to 47s) stereo audio at 44.1kHz, however you may test even longer clips (after a 60-80s the result seem to be less precise, around 120s the output becomes bland as it stretches its theme somehow).

Conclusion

While other models may excel in music generation, Stable Audio Open 1.0 stands out for its open-ended possibilities for experimentation. Future versions may offer improved adherence to prompts, but it's important to acknowledge the inherent features and limitations of diffusion models and embrace the aleatoric randomness that is part of their nature. The key lies not in using the models out of the box, but in building upon the technology and customizing it to your specific needs, incorporating your own creations and sounds from field recordings or sound design experiments.

Resources

You may also like:

Subscribe

Stay connected to make sure you don’t miss anything. Join our newsletter community for artists, designers, and art and science enthusiasts.