The LangWatch Blog
Engineering deep-dives, product updates, and field lessons on evaluating, testing, and observing AI agents in production.
Article
Customer Story: How Roojoom automates AI Agent Quality Control with LangWatch Scenario
Using LangWatch Scenario, the Rojoom product team built a daily automation way to ship new AI features with confidence.
Manouk Draisma · June 25, 2026
Article
Introducing: Testing voice agents like you test your chat agents
Test voice agents the way you test text agents - simulated callers, traces, playback, and judge-based evaluation -…
Manouk · June 2, 2026

Article
What happens when two engineering teams just... talk
Manouk Draisma · May 19, 2026
Developers
Eat Sleep Append Repeat…
At LangWatch, we process a not-insignificant number of LLM traces, agentic simulations, evaluations, and experiment…
Alex Forbes-Reed · April 20, 2026
Developers
Four Refactors and a Funeral: Migrating a Live System to Event Sourcing
LangWatch is open source. Every commit hash in this post is real and clickable. You can see exactly how we got from…
Alex Forbes-Reed · April 20, 2026
Developers
Internal Product vs Internalised Trauma: Supporting Event Sourced Systems
Before we built anything custom we added metric reporting to the group queue, sent to Grafana. Group queue overview,…
Alex Forbes-Reed · April 20, 2026
More from the blog
April 15, 2026Article
Every way your AI agent can be broken (and how attackers actually do it)
April 14, 2026Article
Why AI Red teaming is broken (and how we fixed it)
March 27, 2026Article
How we test Agent Skills with Scenario simulations
March 26, 2026Product Features
Getting to value with LangWatch, faster than ever - how to migrate from Langfuse to LangWatch with Skills.
March 25, 2026Article
A Note on the LiteLLM Vulnerability
March 25, 2026Product Features
Product Managers and leaders are running agent simulations now, and it changing how AI ships
March 24, 2026Article
Making your AI Agent reliable: Adding Evaluations to your multi-modal agent with LangWatch Skills
March 23, 2026Product Features
LangWatch Skills: Your coding agent already knows how to test your agent
March 12, 2026Product Features
Introducing LangWatch MCP: Test and evaluate AI Agents without leaving your workflow
March 6, 2026Developers
The Agent Development Lifecycle: Why shipping is the easy part
February 20, 2026Article
New Pricing: AI growth shouldn’t increase your bill
February 10, 2026Article
What is LLM monitoring? (Quality, cost, latency, and drift in production)
February 10, 2026Article
What is Prompt Management? And how to version, control & deploy prompts in productions
February 3, 2026Developers
How OpenClaw / ClawBot works behind the scenes - and why agent observability matter
February 3, 2026Article
Instrumenting Your OpenClaw Agent with LangWatch via OpenTelemetry
February 3, 2026Article
How to Use Clawdbot + LangWatch to Monitor Your Agents in Production
February 2, 2026Article
LLM Evaluations Explained: Experiments, Online Evaluations, Guardrails, and when to use each in 2026
January 30, 2026Article
4 best tools for monitoring LLM & agent applications in 2026
January 30, 2026Article
Arize AI alternatives: Top 5 Arize competitors compared (2026)
January 30, 2026Article
Top 8 LLM Observability Tools: Complete Guide for 2025
January 30, 2026Article
Top 5 AI evaluation tools for AI agents & products in production (2026)
January 29, 2026Article
How to test AI Agents with LangWatch & Mastra / Google ADK and ship them reliably
December 30, 2025Article
Top Tools for Evaluating Voice Agents in 2025
December 29, 2025Article
What are the AI Agent Events in 2026: The must-attend conferences for Agentic AI Builders
December 24, 2025Product Features
Closing the year Strong: December Product Updates
December 23, 2025Article
How to do Tracing, Evaluation, and Observability for Google ADK
December 23, 2025Article
Top 5 AI Prompt Management Tools of 2025
December 23, 2025Article
Writing Effective AI Evaluations, that hold up in production
December 12, 2025Article
Why Agentic AI needs a new layer of testing
November 26, 2025Product Features
Launch Week Day 5: Better Agents CLI: The reliability layer for the next wave of agent development
November 25, 2025Article
Scenario MCP: Automatic Agent Test Generation inside your editor
November 24, 2025Article
Testing Voice Agents with LangWatch Scenario in Real Time
November 20, 2025Article
A Systematic way of Testing of AI Agents
November 20, 2025Product Features
Introducing: LangWatch newest Prompt Playground
October 27, 2025Article
How LangWatch helps enterprises test, evaluate, and trust their AI before release
October 17, 2025Article
Build vs Buy - Should you build your own LLMOps stack or leverage a purpose-built platform designed for enterprise scale?
October 17, 2025Article
The 4 Best LLM Evaluation Platforms in 2025: Why LangWatch redefines the category with Agent Testing (with Simulations)
October 15, 2025Article
Need-based Context Engineering: Let tests tell you what your AI agent actually needs
October 6, 2025Article
The Ultimate RAG Blueprint: Everything you need to know about RAG in 2025/2026

September 26, 2025Article
From Scenario to Finished: How to Test AI Agents with Domain-Driven TDD
September 25, 2025Article
Building Reliable AI Applications: Why Evals (and Scenarios) Are the backbone of trustworthy AI
September 7, 2025Article
Are evals dead?
September 3, 2025Article
Essential LLM evaluation metrics for AI quality control: From error analysis to binary checks
August 22, 2025Article
Trace IDs in AI: LLM Observability and Distributed Tracing
August 19, 2025Article
The 6 context engineering challenges stopping AI from scaling in production
August 18, 2025Article
LLMOps is the new DevOps, here’s what every developer must know
August 14, 2025Article
LLM observability: What is it and why it matters
August 8, 2025Product Features
GPT-5 Release: From Benchmarks to production reality
August 7, 2025Article
LLM-as-a-Judge: Using the Panel of Judges Approach to Approximate Human Preference
August 1, 2025Product Features
Observability Framework Design for LLM Apps - The Complete LangWatch Guide
July 18, 2025Developers
Top 4 Humanloop Alternatives in 2025
July 7, 2025Article
Why Agent Simulations are the new Unit Tests for AI
June 27, 2025Article
Real-time simulation visualization and debug mode
June 26, 2025Article
Scripted simulations, evaluations, and guardrails
June 25, 2025Article
Test agents on Mastra, Agno, and 10+ other frameworks
June 24, 2025Article
Introducing simulation-based agent testing
June 24, 2025Article
Why LangWatch Scenarios represents the future of AI agent testing
June 21, 2025Article
Best AI Agent Frameworks in 2025: Comparing LangGraph, DSPy, CrewAI, Agno, and More
June 20, 2025Article
Multilingual AI Agent Testing: Using Scenario to Simulate, Break, and Improve LLMs
June 18, 2025Article
LangSmith Alternatives: What to use if you need more security and control
June 13, 2025Article
Intro to Scenario (Testing AI agents)
June 12, 2025Article
Simulations from First Principles (How to test your agents)
June 11, 2025Article
Agent Evaluation: Framework for Testing AI Agents
June 6, 2025Article
Simulation Based Eval Framework
May 30, 2025Article
Introduction: The Real Issue isn’t RL
May 28, 2025Article
Simulations to Test My Agent
May 15, 2025Developers
New Python SDK Brings Native OpenTelemetry to GenAI Observability
May 5, 2025Product Features
April Product Recap: Selene Integration, Eval Wizard Upgrades, Prompt Studio & More
May 5, 2025Product Features
LLM Monitoring & Evaluation for Real-World Production Use
April 24, 2025Article
Systematically Improving RAG Agents
April 22, 2025Product Features
Introducing the Evaluations Wizard: How to evaluate your LLM: Building an LLM evaluation framework that actually works
April 18, 2025Article
Function Calling vs. MCP: Why You Need Both - and How LangWatch Makes It Click
April 18, 2025Article
Why LLM Observability is Now Table Stakes
April 17, 2025Article
LangWatch vs. LangSmith vs. Braintrust vs. Langfuse: Choosing the Best LLM Evaluation & Monitoring Tool in 2025
April 8, 2025Article
Introducing Scenario: Use an Agent to Test Your Agent
April 4, 2025Article
Tackling LLM Hallucinations with LangWatch: Why Monitoring and Evaluation Matter
April 3, 2025Article
LLM evaluations at Swis for Dutch government projects by LangWatch
April 2, 2025Article
Why Your AI Team Needs an AI PM (Quality) Lead
March 27, 2025Article
LangWatch and adesso join forces: Accelerating Secure LLM Adoption for Enterprises
March 25, 2025Article
LLMOps Is Still About People: How to Build AI Teams That Don’t Implode
March 20, 2025Article
Practical LLM Evaluation Framework for AI Development Teams
March 16, 2025Article
What is Model Context Protocol (MCP)? And how's LangWatch involved?
March 14, 2025Article
How PHWL.ai uses LLM Observability and Optimization to Improve AI Coaching with LangWatch
February 25, 2025Article
LangWatch.ai - Announcing - €1M funding round to bring the power of Evaluations and Auto-Optimizations to AI teams.

February 20, 2025Article
OpenAI, Anthropic, Deepseek and other LLM Providers keep dropping prices: Should you host your own model?
January 1, 2025Article
7 Predictions for AI in 2025: A CTO's, Rogerio Chaves Perspective
December 20, 2024Article
Customer Stories: HolidayHero AI start-up <> LangWatch
December 10, 2024Article
LangWatch Optimization Studio - Built for AI Engineers, by AI Engineers
November 10, 2024Article
The power of MIPROv2 (DSPy) in a Low-Code environment with LangWatch’s Optimization Studio
November 7, 2024Article
What is Prompt Optimization? An Introduction to DSPy and Optimization Studio
July 27, 2024Article
Deploying an OpenAI RAG Application to AWS ElasticBeanstalk
July 3, 2024Article
The complete guide for TDD with LLMs
June 27, 2024Article
Data Flywheel: Using your production data to build better LLM products
June 11, 2024Article
How Algomo reduced AI hallucinations with LangWatch
June 10, 2024Article
The AI Team: Integrating User and Domain Expert Feedback to Enhance LLM-Powered Applications
June 10, 2024Article
Unit Testing Your LLM: The Power of Datasets
June 3, 2024Product Features
Introducing DSPy Visualizer
May 20, 2024Article
New Dutch Startup, LangWatch, brings much-needed quality control to GenAI

May 14, 2024Article
How to build a RAG application from scratch with the least possible AI Hallucinations
May 13, 2024Article
LLM Reliability with Retrieval-Augmented Generation
May 13, 2024Article
Safeguarding Your First LLM-Powered Innovation: Essential Practices for Security
May 10, 2024Article
What is User Analytics for LLMs, The Difference With Traditional Analytics, And Why is it Important?
May 8, 2024Article
Unlocking the Potential of Large Language Models: The LLM's Beyond the Hype
May 6, 2024Article
The 8 Types of LLM Hallucinations
May 1, 2024Article
5 Things You Must Consider Before Putting Your Chatbot Live in Production
May 1, 2024Article
Navigating the Complexities of AI-Powered Products
April 29, 2024Article
Understanding Hallucinations: What are they?
April 18, 2024Article
Mastering the GenAI Wave: Strategies for Success in AI Adoption
April 18, 2024Article
Successfully building an AI Startup in the current booming industry
April 17, 2024Article
How Struck.build improved AI Performance with LangWatch
April 8, 2024Article
Journey Through Innovation: The LLM Adventure
Article