AI voice tools

Whisper

Name: Whisper
Author: OpenAI

OpenAI's open-source speech-to-text model — runs locally, best-in-class transcription accuracy.

Whisper is OpenAI's open-source speech-to-text model, widely regarded as the most accurate general-purpose transcription model available. It handles 100 languages, performs well on accented speech and noisy audio, and produces punctuated, readable transcripts. The weights are freely available — you can run it locally on your own machine with no per-request cost. Many popular tools (Descript, Krisp, Otter) use Whisper under the hood, and it's the default choice when developers build transcription into their own apps.

Self-hosted use is completely free. The OpenAI API charges $0.006 per minute of audio, which is among the cheapest transcription APIs available. Most often compared to Deepgram and AssemblyAI for production use — Whisper's edge is accuracy, open-source availability, and the ability to self-host; Deepgram's is lower latency for real-time transcription use cases.

Made by	OpenAI
Pricing	Open-source (free to self-host) · OpenAI API $0.006/minute
Best for	Transcription, local speech-to-text, multilingual audio, developer API integration

transcription
stt
open-source
local
multilingual

Alternatives to Whisper

ElevenLabs
Best-in-class AI voice cloning, TTS, and dubbing — 3,000+ voices, 30+ languages.
Murf
Studio-quality AI voiceovers for videos, e-learning, and ads.
Play.ht
Realistic AI text-to-speech and voice cloning with a developer API.
Resemble AI
Voice cloning, real-time TTS, and deepfake audio detection.
AssemblyAI
Speech-to-text API with summarisation and audio understanding models.
Deepgram
Fast, accurate speech-to-text and voice agent APIs for developers.

See all 8 Whisper alternatives →