promptfoo
Observability & evalsAn open-source tool for testing, evaluating, and red-teaming prompts and models.
promptfoo evaluates prompts and models from a simple config: define test cases and assertions, run them across prompts and providers, and compare results side by side. It also does security red-teaming, probing models for unsafe behaviour.
It runs from the command line and slots into CI, making prompt changes something you measure rather than guess at.
Where it's ideally used
A fit when you want config-driven evaluation across prompts and models, plus security red-teaming, in one CLI tool.
Where it doesn't fit
A testing and red-teaming tool, not a production tracing platform — pair it with one rather than replace it.