Whisper
Voice & transcriptionOpenAI's open-source speech-to-text model — the default starting point for transcription.
Whisper is an open-source automatic speech recognition model from OpenAI. It is robust across accents, background noise, and dozens of languages, and it can run entirely on your own hardware.
Because the weights and code are open, Whisper became the base that much of the open transcription ecosystem builds on — faster runtimes and diarization tools all start here.
Where it's ideally used
The default when you need solid, multilingual transcription you can run yourself, with no audio leaving your environment.
Where it doesn't fit
The base implementation is not tuned for real-time, low-latency streaming — for live captioning, reach for a derivative or a streaming API.