When recording videos and especially podcasts, it is very important to have clear audio. Achieving perfect sound is difficult even in studio conditions, and there’s nothing to say about the street. And if not solved, then special neural networks can simplify the solution of this problem. Let's talk about them.
The results of all services and the original can be downloaded from the link at the end of the article. And just in case, I’ll note once again that I specifically chose only tools with AI support. It is clear that there are many more solutions to improve audio.
Adobe Podcast AI
With the free plan, you can download audio in .wav, .mp3, .acc, .flac, .org formats up to 30 minutes long and no more than 500 MB in size. You can improve no more than an hour of audio per day. You can also configure the microphone using AI.
After downloading the file, you need to wait up to ten minutes for it to be automatically processed by the neural network. There are no settings, except for the speech enhancement switch. The final file can only be downloaded.
For some reason, the processed file became half the size, although neither the format nor the duration changed. But even visually, there were fewer peaks on the track. When listening, a strong improvement was also noticeable, words became clearer and more distinguishable, noise was removed.
The paid plan for $9.99 per month can batch download, finer-tuned processing, up to 4 hours of audio per day and file size up to 1 GB. Also free access to express.adobe.com.
More fine-tuning looks like this:
There is an online version and desktop applications. In general, this service is more about fast processing using neural networks than about improving audio.
Unfortunately, it will not suit you if the original audio is in Russian. But if your language is supported, you can automatically transcribe audio into text and use text editing to edit the content of the podcast itself.
Descript uses Google Cloud Speech-to-Text technology, so the level of accuracy is much the same as Google Home or Text-to-speech in Google Docs. But there is Rev (another provider) for better transcription. There is the ability to level the volume level and a built-in equalizer.
There's also Overdab (available on-demand only in beta), but at the time of writing, I still haven't gotten access to it.
As I understand it, this is AI (artificial intelligence) that processes the recording and extracts a speech model from it to form your unsaid words and sentences. According to those who experimented with this tool, the results are impressive—the speech simulated by the robot program has the same voice intonation as the original.
The tariff schedule can be found below.
This is a service for extracting voices from various types of video and audio files: music, streams, etc. This service has an online version, desktop and mobile applications.
In the free plan, you can process up to 10 minutes of audio (no more than 200 MB), but you can try the application without spending paid minutes. To achieve this, you need to activate the Create preview switch, and you will receive a short fragment of the processed track. MP3, OGG, WAV, FLAC, AVI, MP4, MKV, AIFF, AAC formats are supported.
I'm having problems downloading a 184 MB and 100 MB files. For some reason, the service still complained that I had exceeded the free volume. Therefore, I was never able to try it in practice.
The desktop version still requires a network connection to work. The audio will be processed online, and you will have to wait for it to be uploaded to the server.
Packages of minutes are purchased one-time. 90 minutes and 2 GB will cost you $15.
An online service that supports working with video and audio: asf, wmv, mp4, quicktime, webm, x-matroska, x-msvideo, x-flv, wav, flac, x-wav, x-m4a, m4a, ogg, x-flac, amr.
With the free plan, you can process up to 20 minutes of audio, and there were no issues with downloading a file for 30 minutes and a volume of 900 MB.
You can equalize the audio, remove background noise (with an indication of the level of cleaning), and remove reverberation at some point in the future. After processing, you can switch directly online between the before and after versions and listen to them for comparison.
Visually, for some reason, the processed version looks worse than the original. When listening to the difference between the original and the processing, I did not notice.
A paid plan of $12 per month allows you to process up to 900 minutes of audio per month. Or for $20 you can buy 600 minutes before they end.
Open source online service and offline application for Windows. The application can change volume and dynamic range. All parameters are configured manually. The application could not process my source file.
The online version has many more options for improvement, but it was unable to process my sources.
Windows and Mac application for removing various noises from an audio track. There's a plugin for every noise, but we're interested in AudioDenoise because it uses AI.
Unfortunately, you cannot use the plugins yourself. After installing Crumplepop and the plugin, you need to launch one of the supported applications: Premiere Pro, Audition, DaVinci Resolve, Audacity, Pro Tools or Media Composer. Configure the plugin in it, and add the selected effects in your usual tool.
In the case of AudioDenoise, you can select one of the processing presets or configure all the necessary parameters manually.
When setting up, it is not clear where exactly artificial intelligence is used in the work. The developers claim that it removes noise that is inaccessible to other plugins: the sound of an air conditioner, grunting, etc.
You can't tell from the picture below, but the plugin really removed unnecessary noise and made the recording clearer and clearer.
During the trial, you can try the plugin for free, but then you will have to pay a subscription costing $23 per month.
It's an all-in-one online podcast production service with AI-enabled tools at every turn. We are interested in Magic Dust.
Unfortunately, the use of this feature is only available if you pay for the service ($11.99 per month if paid annually). There is no trial provided.
My commitment was not enough to pay for the service, so I have no idea how it works.
An online service that specializes in removing unnecessary sounds from podcasts. You can load a track from one track or several. In free mode, 30 minutes are available for processing.
During boot, you can choose to remove noise automatically or configure each setting manually.
I selected automatic mode to test the basic capabilities of the service. When finished, Cleanvoice tells you exactly what was removed from the original track. When downloading, you can decide to export timestamps and markers for Audacity.
My final file was 3 megabytes instead of the original 109 and appeared noisier. The problem turned out to be that the service deleted fragments without sound, which is why the audio track was much shorter. But I didn’t notice any noise in it either.
Of course, with manual setup there would still be fragments of silence, so this is not a service issue.
Cyberlink AudioDirector 365
This is a complete audio editing application for Windows. Neurostuff can be found in the Repair Audio section, where you can remove noise, remove wind noise, improve the sound of your voice, and much more.
As part of the article, I used the noise removal tool Noise Reduction with default settings. Extra noise remained in the recording, but it was less audible. Probably, you still needed to tweak the instrument settings yourself.
There is a trial. The full version costs $4.33 per month.
Another full-fledged audio editor, which has a whole set of tools with support for Deep Learning: Remix, Extract:Dialogue, DeWind:Dialogue, DeRustle:Dialogue, DeBuzz:Dialogue, DeBird, DeClick:Dialogue and DePlosive:Dialogue.
I think it’s clear from the names of the options what each of them does. The tool turned out to be very complex for an outsider, and to be honest, I couldn’t figure out how to do anything with audio. Therefore, it is better to evaluate his work yourself.
There is a trial. License - from $60.
This service claims right out of the gate that it is AI for podcasters. It has many audio enhancement tools, all using AI in their work. There is support for batch processing and working with tracks from multiple tracks.
In my case, after processing, the noise only became more, but I used the default settings. Still, I recommend using this tool with understanding.
With the free plan, you can process up to two hours of audio per month. Additional hours can be purchased separately if needed.
You can pick up the source and processed files below 👇