faster-whisper

A fast reimplementation of Whisper on CTranslate2, with much lower latency and memory use.

faster-whisper reimplements Whisper inference on the CTranslate2 engine. The result is the same model with several times the speed and a smaller memory footprint — the same transcript, far cheaper to produce.

It is the usual production answer to "Whisper is too slow": a drop-in upgrade that keeps accuracy while making self-hosted transcription practical at scale.

Where it's ideally used

The default when you want Whisper-quality transcription self-hosted, but the reference implementation is too slow or memory-hungry.

Where it doesn't fit

A faster runtime, not new capability — it does not add streaming or diarization on its own.