SGLang
Model servingA fast open-source serving runtime with structured generation and high throughput.
SGLang is a serving engine that pairs high throughput with a structured-generation front-end — efficient handling of constrained output, complex prompting patterns, and shared prefixes through prefix caching.
It is a strong choice for serving the latest large open models, and is often quick to support new architectures.
Where it's ideally used
A fit for high-throughput self-hosted serving, especially with structured output or heavy shared-prefix workloads.
Where it doesn't fit
Overkill for a single-user local model, where a simple runtime is far less effort.