DeepEval

An open-source "Pytest for LLMs" — a unit-testing framework for model outputs.

DeepEval brings the unit-testing model to LLM output. You write test cases with metrics — relevance, faithfulness, hallucination, and more — and run them like Pytest tests, so model quality has a pass/fail gate.

That makes it natural to put model evaluation directly into a CI pipeline.

Where it's ideally used

A fit when you want LLM quality checks written as tests and run automatically in CI.

Where it doesn't fit

A testing framework, not a live observability platform — it does not monitor production traffic.