Capability layer

Voice & transcription

Speech-to-text and text-to-speech — the audio edges of an AI system.

10 tools 10 with full write-ups Open this layer in the explorer

OpenAI's open-source speech-to-text model — the default starting point for transcription.

Voice & transcription

A commercial speech-to-text API pairing transcription with audio-intelligence models.

Voice & transcription

A commercial provider of very low-latency, realistic text-to-speech for real-time voice apps.

Voice & transcription

A commercial speech API built for fast, accurate transcription at scale, including real-time.

Voice & transcription

A leading text-to-speech and voice platform known for highly natural, expressive synthetic speech.

Voice & transcription

A fast reimplementation of Whisper on CTranslate2, with much lower latency and memory use.

Voice & transcription

A small, open-weight text-to-speech model that produces natural voices on modest hardware.

Voice & transcription

A fast, local open-source text-to-speech system designed to run well on small devices.

Voice & transcription

A platform for building, testing, and deploying real-time voice agents over the phone.

Voice & transcription

An open-source extension of Whisper adding accurate word-level timestamps and speaker diarization.