Evals training for your team

Teach your team to evaluate agents properly.

A guided curriculum we run with platform teams shipping LLM agents. From scoring fundamentals to scenario design, run as a workshop or self-paced cohort. Built by the team that ships LangWatch.

Open the curriculum Bring it to your team

curriculum · six modules

v2026.q2

M01 · Why evals, and when they break
M02 · Scoring fundamentals
M03 · Designing scenarios that survive
M04 · LLM-as-judge, done well
M05 · Online vs offline, and CI
M06 · From evaluation to optimization

delivered remote or on-site · 1.5 days

What we cover

From first eval to a production quality habit.

The training distills two years of running quality programs with teams from healthcare, fintech, customer support, and voice platforms. Practical and opinionated, by design.

Stage 01Foundations

Why classic test pyramids do not survive contact with a non-deterministic agent. The vocabulary teams need to talk about quality.

Stage 02Scoring that scales

Built-in evaluators, LLM-as-judge, bring-your-own metrics. When to use which, and how to stop chasing scores.

Stage 03Scenario design

Crafting scenarios that uncover real failure modes. Voice, multi-tool, multi-turn. How few scenarios you actually need.

Stage 04Shipping it

CI gates that actually fail builds. Production monitors. Connecting business metrics to agent quality.

Outcomes

Teams leave with a quality habit, not a slide deck.

90%

of teams ship a working CI eval gate by end of day two

increase in confident releases reported in follow-up surveys

platform setup required to attend; LangWatch is optional

Bring evals training to your team.

Open the curriculum or talk to us about running the workshop for your team, remote or on-site.

Open the curriculum Talk to us