Cartesia
Voice & transcriptionA commercial provider of very low-latency, realistic text-to-speech for real-time voice apps.
Cartesia builds text-to-speech tuned for real-time conversation, where latency is as important as quality — its models are designed to start speaking almost instantly.
That focus makes it a common choice underneath live voice agents, where a slow first syllable breaks the illusion of a conversation.
Where it's ideally used
A fit when a live voice agent needs natural speech with the lowest possible time-to-first-sound.
Where it doesn't fit
Hosted and metered — and latency-tuning is wasted on batch narration where speed does not matter.