Whisper

OpenAI's open-source speech-to-text model — the default starting point for transcription.

Whisper is an open-source automatic speech recognition model from OpenAI. It is robust across accents, background noise, and dozens of languages, and it can run entirely on your own hardware.

Because the weights and code are open, Whisper became the base that much of the open transcription ecosystem builds on — faster runtimes and diarization tools all start here.

Where it's ideally used

The default when you need solid, multilingual transcription you can run yourself, with no audio leaving your environment.

Where it doesn't fit

The base implementation is not tuned for real-time, low-latency streaming — for live captioning, reach for a derivative or a streaming API.