Best AI audio & voice tools
24 ai audio & voice tools in the Tool Hut catalog — compare them, find alternatives, and track the ones you actually use.
Open Tool Hut — it's free →- ElevenLabs
Best-in-class AI voice cloning, TTS, and dubbing — 3,000+ voices, 30+ languages.
- Suno
AI music generation from text prompts — full songs with vocals and instrumentation.
- Udio
AI music creation with granular control over style, structure, and stems.
- Whisper
OpenAI's open-source speech-to-text model — runs locally, best-in-class transcription accuracy.
- Murf
Studio-quality AI voiceovers for videos, e-learning, and ads.
- Play.ht
Realistic AI text-to-speech and voice cloning with a developer API.
- Resemble AI
Voice cloning, real-time TTS, and deepfake audio detection.
- AssemblyAI
Speech-to-text API with summarisation and audio understanding models.
- Deepgram
Fast, accurate speech-to-text and voice agent APIs for developers.
- Krisp
Real-time noise cancellation plus meeting transcription and notes.
- Speechify
Text-to-speech that reads articles, PDFs, and books aloud naturally.
- WellSaid
Enterprise-grade AI voiceover with consistent branded voices.
- LALAL.AI
Split songs into clean vocal, instrument, and drum stems.
- Moises
Musician's app for stem separation, pitch, and tempo control.
- Soundraw
Royalty-free AI music you can customise for videos and content.
- AIVA
AI composer for emotional soundtracks and instrumental scores.
- Stable Audio
Stability AI's tool for generating music and sound effects from text.
- Riffusion
Generate songs and instrumentals from prompts, with a free tier.
- Fish Audio
Open-weight text-to-speech and fast voice cloning.
- Cartesia
Ultra-low-latency voice models for real-time conversational agents.
- Adobe Podcast
Enhance speech to studio quality and remove background noise.
- Voicemod
Real-time voice changer and soundboard for streams and games.
- Hume AI
Emotion-aware voice AI with expressive, empathic speech — API and consumer products.
- Sesame
Conversational voice AI with natural, expressive speech designed for companion and assistant apps.