SGLang

A fast open-source serving runtime with structured generation and high throughput.

SGLang is a serving engine that pairs high throughput with a structured-generation front-end — efficient handling of constrained output, complex prompting patterns, and shared prefixes through prefix caching.

It is a strong choice for serving the latest large open models, and is often quick to support new architectures.

Where it's ideally used

A fit for high-throughput self-hosted serving, especially with structured output or heavy shared-prefix workloads.

Where it doesn't fit

Overkill for a single-user local model, where a simple runtime is far less effort.