Text Generation Inference

Hugging Face's production server for high-performance LLM serving.

Text Generation Inference (TGI) is Hugging Face's production-grade serving engine. It offers continuous batching, optimized kernels, quantization, and tight integration with the Hugging Face model hub.

It covers much the same ground as other modern inference engines, and is a natural pick for teams already standardised on the Hugging Face ecosystem.

Where it's ideally used

A fit for production self-hosted serving, especially for teams already deep in the Hugging Face ecosystem.

Where it doesn't fit

For a single-user local model it is heavier than a simple runtime, and vLLM has broader community momentum.