Evals training for your team

Teach your team to evaluate agents properly.

A guided curriculum we run with platform teams shipping LLM agents. From scoring fundamentals to scenario design, run as a workshop or self-paced cohort. Built by the team that ships LangWatch.

curriculum · six modules
v2026.q2
  1. M01 · Why evals, and when they break
  2. M02 · Scoring fundamentals
  3. M03 · Designing scenarios that survive
  4. M04 · LLM-as-judge, done well
  5. M05 · Online vs offline, and CI
  6. M06 · From evaluation to optimization
delivered remote or on-site · 1.5 days
What we cover

From first eval to a production quality habit.

The training distills two years of running quality programs with teams from healthcare, fintech, customer support, and voice platforms. Practical and opinionated, by design.

Stage 01Foundations

Why classic test pyramids do not survive contact with a non-deterministic agent. The vocabulary teams need to talk about quality.

Stage 02Scoring that scales

Built-in evaluators, LLM-as-judge, bring-your-own metrics. When to use which, and how to stop chasing scores.

Stage 03Scenario design

Crafting scenarios that uncover real failure modes. Voice, multi-tool, multi-turn. How few scenarios you actually need.

Stage 04Shipping it

CI gates that actually fail builds. Production monitors. Connecting business metrics to agent quality.

Outcomes

Teams leave with a quality habit, not a slide deck.

90%
of teams ship a working CI eval gate by end of day two
5x
increase in confident releases reported in follow-up surveys
0
platform setup required to attend; LangWatch is optional

Bring evals training to your team.

Open the curriculum or talk to us about running the workshop for your team, remote or on-site.