Infrastructure layer

Observability & evals

Tracing, evaluation, and monitoring — knowing whether the AI is actually working.

11 tools 11 with full write-ups Open this layer in the explorer

An open-source platform for tracing, evaluating, and monitoring LLM and agent applications.

Observability & evals

A commercial platform for evaluating, logging, and iterating on AI products.

Observability & evals

An open-source "Pytest for LLMs" — a unit-testing framework for model outputs.

Observability & evals

An open-source observability platform that logs and analyzes LLM calls through a proxy.

Observability & evals

LangChain's platform for tracing, testing, and evaluating LLM and agent applications.

Observability & evals

An open-source observability and prompt-management platform for LLM applications.

Observability & evals

An open-source set of OpenTelemetry extensions for standardized LLM observability.

Observability & evals

Comet's open-source platform for tracing, evaluating, and monitoring LLM applications.

Observability & evals

Arize's open-source tool for tracing, evaluating, and debugging LLM and agent apps.

Observability & evals

An open-source tool for testing, evaluating, and red-teaming prompts and models.

Observability & evals

An open-source framework for evaluating RAG pipelines on faithfulness and relevance.