Whisper
OpenAI's open-source speech-to-text model — runs locally, best-in-class transcription accuracy.
Add Whisper to your hut →Whisper is OpenAI's open-source speech-to-text model, widely regarded as the most accurate general-purpose transcription model available. It handles 100 languages, performs well on accented speech and noisy audio, and produces punctuated, readable transcripts. The weights are freely available — you can run it locally on your own machine with no per-request cost. Many popular tools (Descript, Krisp, Otter) use Whisper under the hood, and it's the default choice when developers build transcription into their own apps.
Self-hosted use is completely free. The OpenAI API charges $0.006 per minute of audio, which is among the cheapest transcription APIs available. Most often compared to Deepgram and AssemblyAI for production use — Whisper's edge is accuracy, open-source availability, and the ability to self-host; Deepgram's is lower latency for real-time transcription use cases.
| Made by | OpenAI |
|---|---|
| Pricing | Open-source (free to self-host) · OpenAI API $0.006/minute |
| Best for | Transcription, local speech-to-text, multilingual audio, developer API integration |
Alternatives to Whisper
- ElevenLabs
Best-in-class AI voice cloning, TTS, and dubbing — 3,000+ voices, 30+ languages.
- Suno
AI music generation from text prompts — full songs with vocals and instrumentation.
- Udio
AI music creation with granular control over style, structure, and stems.
- Murf
Studio-quality AI voiceovers for videos, e-learning, and ads.
- Play.ht
Realistic AI text-to-speech and voice cloning with a developer API.
- Resemble AI
Voice cloning, real-time TTS, and deepfake audio detection.