promptfoo

An open-source tool for testing, evaluating, and red-teaming prompts and models.

promptfoo evaluates prompts and models from a simple config: define test cases and assertions, run them across prompts and providers, and compare results side by side. It also does security red-teaming, probing models for unsafe behaviour.

It runs from the command line and slots into CI, making prompt changes something you measure rather than guess at.

Where it's ideally used

A fit when you want config-driven evaluation across prompts and models, plus security red-teaming, in one CLI tool.

Where it doesn't fit

A testing and red-teaming tool, not a production tracing platform — pair it with one rather than replace it.