Product

The next generation of Simulations: testing that keeps pace with your agents

Posted on June 30, 2026

Eric Hermann

Product Manager

Article

Table of contents

Introduction

What is an Agent Engineer?

Subscribe to our Newsletter

Get monthly updates with our latest articles, podcasts, videos, and more.

Done!

Oops! Something went wrong while submitting the form.

Today, we're introducing the next generation of Simulations, Decagon's testing suite for AI agents.

Most AI platforms include a testing solution, but testing powered by AI still comes with challenges. Results can feel like a black box where tests pass or fail without clear explanations, and when something goes wrong, it's not always obvious whether the issue lies in the agent or in the setup of that test. Without that clarity, even a passing test suite is hard to fully trust.

As teams have scaled their agents across more workflows and channels, three things have become clear: building and maintaining a comprehensive test suite needed to be far less manual, results needed to be granular and interpretable enough to act on with confidence, and testing needed to live inside the deployment process rather than run alongside it. The latest Simulations are built to deliver all three.

A self-maintaining golden test set

A golden test set is a comprehensive, always-relevant suite of tests that covers every workflow and edge case. It continuously validates that your agents are performing at their best, and the stronger it is, the less regression you'll experience over time.

Building one has historically been out of reach for most teams. Writing tests that cover diverse paths is time-consuming work, and as agents evolve, those tests go stale and need to be refreshed. Maintaining a test suite at scale has required a level of ongoing effort that compounds quickly.

With the next generation of Simulations, that changes. Using Duet, teams can auto-generate tests directly from AOPs and production conversations, removing the manual work of building test cases from scratch. And when agent changes make existing tests outdated, the platform detects it automatically and updates or removes them. The result is a test suite that stays current without requiring teams to maintain it manually.

“We built 50+ Simulations with Duet in minutes to QA a brand new AOP — happy paths, escalation paths, edge cases, all of it. It flagged exactly what was broken and why, and we went from testing to production in hours instead of days. I've never felt that confident in an AOP at first launch. It's a game-changer for how fast we can build and ship.”

─ Julia Alexander, SVP of Customer Experience, Spot & Tango

Control and transparency on results

Rigid, checklist-based testing has a well-known limitation: it can satisfy every criterion and still miss a fundamentally poor interaction. But testing that swings too far in the other direction, using AI broadly without clear structure, creates a different problem: results that are directionally useful but hard to interpret or act on.

Our latest Simulations are built for the space in between.

Checkpoints give each simulation a structured conversational arc, stage by stage, so teams can see exactly where in a conversation something went right or wrong, including detail about how severe each issue is. Assertions guide that evaluation without reducing it to a checklist, with flexibility built in for the dynamic nature of agent behavior. Simulations evaluate agent behavior across multiple dimensions from whether the test was set up correctly, all the way to how the agent actually performs in the conversation and whether the interaction feels natural across every channel.

Together, these capabilities give teams results they can trust and act on with confidence.

Production-grade testing

Testing has largely been a manual pre-deployment step, separate from the deployment process itself. That gap between catching issues and acting on them slows down the teams building agents and creates unnecessary risk as deployments scale.

Simulations now integrate natively with CI/CD pipelines. Every agent version is tested before it merges, from branch to staging to production, and the simulation suite manages its own test lifecycle automatically alongside agent versioning. It behaves like a proper software test suite, not a separate QA layer that has to be maintained in parallel.

Testing that evolves with your agents

The goal of Simulations has always been to give teams confidence in their agents. The next generation raises the bar on what that confidence can look like: a golden test suite that builds itself, results transparent enough to act on immediately, and testing woven into every stage of the deployment process.

If you’d like a deeper look into Simulations and how they work with the rest of the platform, get a demo now.

Eric Hermann

—

Product Manager

“With Decagon Voice, we’re able to combine high performance and seamless brand customization with cross-channel memory, ensuring every interaction is connected and true to Chime’s member-first values.”

Janelle Sallenave

Chief Operating Officer

Start improving your workflow with Decagon

With Decagon, CX teams don’t have to guess whether a change will improve CSAT or deflection. They can move quickly, measure what matters, and act on what works.

Get a demo

Join us

There are very few places where you can prototype with frontier LLMs, ship to production in days, and watch users engage with the systems you built—all while owning the entire stack, from intent parsing and tool usage to API integration and observability. This role at Decagon is one of those places.

From my own experience working across both agent development and broader engineering initiatives at Decagon, I’ve seen firsthand how uniquely impactful this work can be. Whether I’m building intelligent workflows for customers or designing infrastructure that supports our agent platform, it’s rare to find an environment where the work transitions from concept to production within days, actively powering user experiences and transforming how businesses operate.

If you’re looking for a role where you can:

Build at the frontier of LLMs, automation, and user interaction
Deploy AI agents that solve high-value business use cases across industries including retail, travel and hospitality, fintech, edtech, and more
Work directly with customers on high-impact use cases
Ship fast, iterate constantly, and own your work from idea to production
Join a fast-moving, collaborative team solving real-world challenges with AI

We’d love to hear from you!

Explore careers