DeepEval
Observability & evalsAn open-source "Pytest for LLMs" — a unit-testing framework for model outputs.
DeepEval brings the unit-testing model to LLM output. You write test cases with metrics — relevance, faithfulness, hallucination, and more — and run them like Pytest tests, so model quality has a pass/fail gate.
That makes it natural to put model evaluation directly into a CI pipeline.
Where it's ideally used
A fit when you want LLM quality checks written as tests and run automatically in CI.
Where it doesn't fit
A testing framework, not a live observability platform — it does not monitor production traffic.