WhisperX
Voice & transcriptionAn open-source extension of Whisper adding accurate word-level timestamps and speaker diarization.
WhisperX wraps Whisper to fix two of its practical gaps: it aligns the transcript to produce accurate word-level timestamps, and it adds speaker diarization so you know who said what.
For anything beyond a plain transcript — captions, meeting notes, searchable audio — those two additions are usually what you actually need.
Where it's ideally used
A fit when a transcript needs precise timing or speaker labels — captions, multi-speaker meeting notes, audio search.
Where it doesn't fit
More pipeline than needed when a plain, untimed transcript is all the task calls for.