Modal
Model servingA serverless platform for running Python — including model inference — on cloud GPUs.
Modal lets you run Python functions on cloud infrastructure, including GPUs, by decorating them — no containers or cluster to manage. It scales to zero when idle and up under load.
For AI work it is a common way to serve custom models or run GPU jobs without operating GPU infrastructure directly.
Where it's ideally used
A fit when you want to serve custom models or run GPU workloads on demand without managing infrastructure.
Where it doesn't fit
A hosted cloud platform — not the choice when inference must run inside your own data centre.