It started to sing...

#60

Born Analog
May 11, 2024

This week, we are back at the creative stuff again. As recently as issue #56 I was banging on about how incredible the music creation capability of Stable Audio v2.0 was. To think that a soundtrack could come together with a simple text prompt seemed ridiculous. Yet true, as the examples showed. How quickly could we make the next quantum leap? Less than a month it turns out! Let’s dive in…

ElevenLabs have been building a great reputation for all things audio via AI for over a year now (lifetime in AI terms). This week, they released an early preview of ElevenLabs Music. Same principle as Stable Audio (text to music) but with an additional layer of incredibleness (if that’s even a word!): singing. What a difference a few words make. Let’s be clear, the AI, on the basis of a few prompts, has composed the music, crafted the lyrics, put together the music and sung the song. Here’s what they released:

Here’s an early preview of ElevenLabs Music.
All of the songs in this thread were generated from a single text prompt with no edits.
Title: It Started to Sing
Style: “Pop pop-rock, country, top charts song.”
— ElevenLabs (@elevenlabsio)
5:52 PM • May 9, 2024

The human input here was title and prompt, the rest is AI. The AI was also able to take the same song and have it delivered in a totally different style. Have a listen:

Title: It Started to Sing (Jazz Version)
Style: “A jazz pop top charts song with emotional vocals, catchy chorus, and trumpet solos.”
— ElevenLabs (@elevenlabsio)
5:52 PM • May 9, 2024

No matter your taste, something can be created that is supremely polished, sounding very much like a top of the charts style song. Yet, when all is said and done, where does this leave creativity? Will we mere humans be able to keep up? Who knows, but you just know that there will always be a place for the real thing. The picture above attempts to show an AI robot singing to an audience at a gig. Would you rather go and see that or be wowed by U2 or Taylor Swift up close as they bring real human energy to the event. U2 everyday for me. However, for a playlist of any genre that you want to listen to on the move, this stuff (and please always bear in mind: what you are hearing today is the absolute worst this will ever be!) would do just fine. Here’s a final example to labour the point:

Title: My Love
Style: “Indie Rock with 90s influences, featuring a combination of clean and distorted guitars, driving drum beats, and a prominent bassline, with a moderate tempo around 120 BPM, and a mix of introspective and uplifting moods, evoking a sense of nostalgia and… x.com/i/web/status/1…
— ElevenLabs (@elevenlabsio)
5:52 PM • May 9, 2024

There is no untouchable facet of human creativity it seems. We have reported recently on the impact of AI on writing, photography, painting, video and now audio. It seems that if we can experience it - and we have enough examples of the ‘thing’ - then we can train an AI to do it better than we will ever be able to. The next decade - hell, year - is going to be completely nuts in all the creative industries.

How long before an AI generated song tops the charts?