Model layer

Model serving

Runtimes that turn model weights into a fast, callable endpoint.

13 tools 13 with full write-ups Open this layer in the explorer

Ollama

★

Model serving

The simplest way to download and run open-weight models locally, with one command.

Open sourceSelf-host

Baseten

Model serving

A platform for deploying and serving custom and open models on autoscaling infrastructure.

ProprietaryHosted API

Fireworks AI

Model serving

A hosted inference platform focused on fast, low-cost serving of open models.

ProprietaryHosted API

Groq

Model serving

A hosted inference service running open models on custom hardware for very low latency.

ProprietaryHosted API

llama.cpp

Model serving

An open-source C/C++ runtime for running LLMs efficiently on CPUs and consumer GPUs.

Open sourceSelf-host

LM Studio

Model serving

A desktop app for discovering, downloading, and running local models with a graphical UI.

ProprietarySelf-host

LocalAI

Model serving

An open-source, OpenAI-compatible API you can run locally over many model backends.

Open sourceSelf-host

Modal

Model serving

A serverless platform for running Python — including model inference — on cloud GPUs.

ProprietaryHosted API

Replicate

Model serving

A platform for running and fine-tuning open models behind a simple hosted API.

ProprietaryHosted API

SGLang

Model serving

A fast open-source serving runtime with structured generation and high throughput.

Open sourceSelf-host

Text Generation Inference

Model serving

Hugging Face's production server for high-performance LLM serving.

Open sourceSelf-host

Together AI

Model serving

A cloud for running and fine-tuning open models with fast, hosted inference.

ProprietaryHosted API

vLLM

Model serving

A high-throughput inference engine for serving open models efficiently in production.

Open sourceSelf-host