Modal

A serverless platform for running Python — including model inference — on cloud GPUs.

Modal lets you run Python functions on cloud infrastructure, including GPUs, by decorating them — no containers or cluster to manage. It scales to zero when idle and up under load.

For AI work it is a common way to serve custom models or run GPU jobs without operating GPU infrastructure directly.

Where it's ideally used

A fit when you want to serve custom models or run GPU workloads on demand without managing infrastructure.

Where it doesn't fit

A hosted cloud platform — not the choice when inference must run inside your own data centre.