Ragas

An open-source framework for evaluating RAG pipelines on faithfulness and relevance.

Ragas measures whether a RAG system is actually working. It scores dimensions like faithfulness (does the answer stay grounded in retrieved context), answer relevance, and retrieval quality — turning "it seems fine" into numbers.

That makes it a regression check: change a chunking strategy or a model, and Ragas tells you whether quality moved up or down.

Where it's ideally used

Essential when retrieval quality matters and you need to measure changes to a RAG pipeline rather than judge them by eye.

Where it doesn't fit

Built specifically for RAG — for evaluating general agent behaviour or non-retrieval tasks you want a broader eval tool.