Over the past six years, spoken word audio—which includes news, sports, talk/personalities, and audiobooks—has increased by 30%, increasing another 8% again in 2020. Those who listen to spoken-word audio average about two hours of listening per day—making for nearly half (48%) of their daily listening time in total.
Despite this acceleration of listening in our current digital age, audio content is one of the mediums coming last in the race for technological advancement. Although it is used across industries, and the need is certainly there, some particulars to the creation of audio clips have inhibited its automation.
This is because there are some key elements of audio creation that cannot be easily replicated by technology, especially when it comes to trying to digitally create speech. These are things like rhythm, tone, time, cadence, and contextual inflection in voice—all of which are essential for audio to satiate listeners’ ears enough to truly engage.
One company, Aflorithmic, is leading the way in helping to innovate audio content for the future with the world’s first fully automatable solution for end-to-end audio. Their “Audio-As-A-Service” platform will help audio creation catch up with the rest of the quickly progressing technological innovations in content creation and allow audio to keep pace with its current consumption.
The Transition From Visual Consumption
People are maxed out on screen time, with more than 50% of adults increasing their screen time on at least two different devices in the past year—leading individuals to look for alternatives in content consumption. This is where audio is stepping up to the plate. Audio also boasts its own advantages that visual content doesn’t have, like enabling listeners to multitask while listening—such as in the case of working out or while cooking a meal.
Yet, a rigid and linear process has been the only way to produce audio to date, and being locked into this highly manual audio production process makes it hard to bring audio clips to listeners quickly and easily. Mistakes can cost significant amounts of money and time, and this has handicapped the magnitude of its production in the digital era.
Company Aflorithmic is changing this by providing a platform that includes the full audio production chain. Their platform shows that quick and digestible content is possible in the world of sound, through an intuitive and fully loaded AI audio platform that helps producers create quality audio from start to finish. Aflorithmic is especially focused on progressing the trickier audio component of synthetic voice—which to date has been difficult to digitally reproduce because it is limited to very few elements of human speaker intonation.
Thus far, most synthetic media companies stop once text-to-speech has been created, revealing an audio clip that is simply a screen reader without the elements needed for the engagement of the listener. AI platforms help take audio content to the next level by providing the audio creation tools that make the difference between a screen-reader and a podcast.
APIs for Audio
Aflorithmic is an API-first solution, which means that the company integrates their APIs with your system instead of forcing users to learn a whole new graphical system. The advantage of this is that you have maximum flexibility and you can run Aflorithmic under the hood, without having to tell anyone, while experienced developers facilitate.
“This is a choice we’ve made deliberately,” said Matthias Lehmann at Aflorithmic. ”We help producers use their own platforms to scale audio within their creator studio of choice.”
Aflorithmics’ APIs consist of three main services that mirror the traditional audio production process. The first is a “script”, where you create text that you want one or more synthetic voices to say, which can be executed manually or automatically from a document or database.
The second service is called “speech” and, like the name, it’s the part of the process where the speech is rendered. Aflorithmic offers more than 400 voices from the best text-to-speech providers and every month dozens more are added—meaning they take care of the compatibility and maintenance required—removing much of the work for the customer.
“This is the step where you can personalize your audio and create thousands of type versions in seconds,” said Lehmann. “This service acts as the alternative to your traditional voice actor and sound studio set-up.”
The third and final service is called “Mastering” and here is where the real magic happens. Users can select an AI-powered sound design that automatically adjusts to the length of the speech needed. The sound also has the capacity to change depending on what is happening in your script – why Aflorithmic calls it sound design rather than background track music, as the service acts as your sound and mastering engineer.
New Channels for Audio
Thanks to these accelerations by AI, automated audio’s influence is growing to penetrate and transform industries that have yet to utilize it. In fact, nearly a quarter of all radio ads in the U.S. are expected to be automated within 2022. Automated audio production can help scale and adapt finalized products in content creation, developing thousands of versions that can be produced in a matter of seconds.
“Adaptability stands for repurposing existing content and turning it into an audio experience,” said Lehmann. “Audio is a format that can be used to promote your written content by offering a summary or ‘snack content’ as audio on formats such as Instagram stories. Another option is monetizing the new audio content you have created out of your existing one, by placing ads in it or having a company sponsor it.”
It is no wonder that the publishing sector is one industry that is eager to tap into this new audio momentum. With smaller publishers especially looking for a solution to create enjoyable content quickly and more cost-effectively—on account of their lack of in-house resources, automated audio provides a huge opportunity, and with shrinking margins in this industry as a whole— this now even applies to large media houses too.
Twitter has been the exemplary platform for quick content because it is designed for short messages and quick updates for consumers, but audio “blurbs” could also be a new, noteworthy avenue to translate rapid updates. If media companies take advantage of this, it puts the power back in the publishing companies’ hands, and content producers won’t have to sell so much of their airtime to advertisers in order to stay afloat.
With a cost-efficient way to produce audio content, even given the short lifespan of a news item, synthetic audio makes a place for itself where scalability and speed are vital. This is probably why Aflorithmic was able to leverage a collaboration with regional German newspaper HNA to create automated newscasts with over 7M monthly readers to automate pieces of news. Just this past March 4th, the HNA newscast celebrated its 100th episode, amassing over 500,000 plays.
Aflorithmic has realized the utmost capacity of audio for modern times, and with it opened new doors to its content production. Through their platform, audio content can be used for any written piece—turning any content into an audio experience that is a tailored experience adapted to every listener’s individual interest.
Disclosure: This article mentions a client of an Espacio portfolio company.