I built ClearAudio to make prompt-based audio cleanup practical. Upload audio, describe what you want ("keep the voice, remove the crowd"), and it separates it out. I stripped it down to the essentials with a minimal 1-bit UI—no extra features, just the core separation.
I built ClearAudio to make prompt-based audio cleanup practical. Upload audio, describe what you want ("keep the voice, remove the crowd"), and it separates it out. I stripped it down to the essentials with a minimal 1-bit UI—no extra features, just the core separation.
Uses Meta's AudioSeg model (https://github.com/facebookresearch/audioseg) under the hood. Also used this as a chance to learn Modal for GPU inference.
Code is open source: https://github.com/sambarrowclough/clearaudio
Happy to answer questions about the stack or take feedback on the UX!