Humans use sound to communicate with each other. Our voices carry subtle cues that allow us to feel what the other person is feeling. It is eerie that an AI can replicate intonation and inflection to emphasize an emotion as that can influence humans.
Anyway…
As growth marketers, we know that including audio in your creatives is important. Audio can transform a losing ad into a winning ad.
Whether it is music, sound effects, or voice-overs, they typically have a positive lift on your creative performance.
If you already have a script that you know converts you can type that text into one of these three AI text2speech tools:
VALL-E
ElevenLabs
Play.ht
I found a new AI text2audio tool and included it at the end. It was released this month and can generate audio like a baby crying or a goat screaming.
VALL-E
Microsoft is well known for integrating chatGPT into Bing and (eventually) Office. Did you know they also have AI text2speech? You can train VALL-E to sound just like you with a 3 second recording.
VALL-E emerges in-context learning capabilities and can be used to synthesize high-quality personalized speech with only a 3-second enrolled recording of an unseen speaker as an acoustic prompt.
I encourage you to head to their Github to listen to their eerie examples. The next two text2speech AI tools need a few minutes to an hour to train their models. VALL-E only needs 3 seconds!
ElevenLabs
ElevenLabs is an AI text2speech tool you can use now. It has been making waves lately due to some deep-faked celebrity voices saying naughty things. However, it is game-changing for audiobooks, podcasts, stories, and of course marketing.
You can listen to a sample of their output in the Spotify link above.
ElevenLabs lets you clone your own voice with a 1-minute recording too!
By the way, stop scrolling and take a deep breath:
Play.ht
Play.ht is another tool you can use to generate realistic voices for your ads. You can listen to a sample of their output in the Spotify interview above. They need an hour of your recorded voice to train their model to replicate your voice.
That’s much longer than the 3 seconds by VALL-E or 1 minute by ElevenLabs. Not great.
If I had to choose a tool to clone my voice, I’d go with ElevenLabs because VALL-E isn’t out yet.
Bonus - AudioLDM - Text2Audio
AudioLDM is a new AI text to audio tool where you can generate music and sound effects to test in your ads.
They haven’t released their pre-trained model yet but here are a few examples I pulled from their Github (a capella, a man speaking in a studio, and pop music, respectively):
I know what’s on your mind but I won’t share the AI generated fart sound. You’ll have to ask it to generate one yourself. You can try this free with the link above.
This is an emerging space so I’ll keep an eye on what’s going to pop out next!
So what?
At Calm, our growth team all recorded ourselves reading our top performing testimonials. The experiments proved to be so successful we kept using those recordings in future ads.
At Cash App, our top performing UAC ad had a burp sound effect. ;)
AI empowers you to choose the voice you want in your ads without having to go through the whole discovery to signing stages for a human.
You save time and effort you can use for something else, like pausing down burnt-out creatives (or replying to IT about why you installed that app on your laptop).
You can use AI speech to say whatever you want in any voice you want. Ethical problems aside, this is a very powerful element to include in your creatives.
For the sake of humanity replace that annoying TikTok robot voice that’s everywhere.
You can use AI to generate audio or speech for your:
Testimonials
Winning scripts
Almost winning scripts
Music
Sound effects
See which ones resonate the most with your audience and always follow a creative testing framework:
https://mansonchen.substack.com/p/creative-testing-framework
Hope this gets your brain juices flowing. Reach out if you have questions or feedback!
-Manson