Infrastructure layer
Observability & evals
Tracing, evaluation, and monitoring — knowing whether the AI is actually working.
Langfuse
★Observability & evals
An open-source platform for tracing, evaluating, and monitoring LLM and agent applications.
Braintrust
Observability & evals
A commercial platform for evaluating, logging, and iterating on AI products.
DeepEval
Observability & evals
An open-source "Pytest for LLMs" — a unit-testing framework for model outputs.
Helicone
Observability & evals
An open-source observability platform that logs and analyzes LLM calls through a proxy.
LangSmith
Observability & evals
LangChain's platform for tracing, testing, and evaluating LLM and agent applications.
Lunary
Observability & evals
An open-source observability and prompt-management platform for LLM applications.
OpenLLMetry
Observability & evals
An open-source set of OpenTelemetry extensions for standardized LLM observability.
Opik
Observability & evals
Comet's open-source platform for tracing, evaluating, and monitoring LLM applications.
Phoenix
Observability & evals
Arize's open-source tool for tracing, evaluating, and debugging LLM and agent apps.
promptfoo
Observability & evals
An open-source tool for testing, evaluating, and red-teaming prompts and models.
Ragas
Observability & evals
An open-source framework for evaluating RAG pipelines on faithfulness and relevance.