Stability AI Releases New Audio Models Capable of Generating Six-Minute Songs
Stability AI, known for its text-to-image capabilities via Stable Diffusion, is now venturing into the audio realm with a new family of models called Stability Audio 3.0. The top model in this lineup can generate professional-grade music that lasts more than six minutes.
The company has released four new models under the Stable Audio 3.0 banner: small SFX (459M parameters), small (459M parameters), medium (1.4B parameters), and large (2.7B parameters). The duo of small models is suitable for on-device sound and music generation up to two minutes long.
Both the medium and large models can create full compositions lasting 6 minutes and 20 seconds, maintaining musical structure and melodic tone. This marks a significant advancement from Stable Audio 2.0, released in 2024, which could only generate music up to four minutes.
Stability AI is making the small SFX, small, and medium models available with open weights for anyone to use and modify freely. In contrast, the large model is available through their API and self-hosted paid services, requiring an enterprise license for companies over $1 million in revenue.
The recent release of these new audio models follows a trend where many companies are releasing tools for music generation. However, as seen in the ongoing legal battles involving Suno and Udio, licensing data and partnerships with major music labels could become crucial for the sustainability of such services.
Last year, Stability AI partnered with Warner Music Group and Universal Music Group to develop models and music creation tools. The company claims that its new audio models are built on fully licensed data.
To support professional musicians, Stability AI is developing a suite of products led by Ethan Kaplan, who previously served as the chief digital officer at Universal Audio and Fender. This move underscores the company’s commitment to providing tools for creators in the music industry.
Other AI companies are also hiring key figures from the music sector to bolster their credentials. For instance, Suno recently hired Jeremy Sirota, the former CEO of Merlin, as its chief commercial officer. ElevenLabs has also brought on Derek Cournoyer from indie music publisher Kobalt as a strategy lead for its music business.
Key Takeaways
- The new Stability Audio 3.0 models can generate six-minute songs, significantly expanding the previous capabilities of Stable Audio 2.0.
- Stability AI is offering these models with open weights for free use, while the large model requires an enterprise license and API access.
- The company has partnered with major music labels to ensure its tools are built on licensed data, a critical factor in the music industry.
- Sustainability of AI-generated music services may hinge on partnerships with established music companies like Warner Music Group and Universal Music Group.
Stay ahead of AI. Get the most important stories delivered to your inbox — no spam, no noise.




