Introducing Duet Autopilot.
Learn more

Product

Blog
/
The next generation of Simulations: testing that keeps pace with your agents

The next generation of Simulations: testing that keeps pace with your agents

June 30, 2026

Today, we're introducing the next generation of Simulations, Decagon's testing suite for AI agents.

Most AI platforms include a testing solution, but testing powered by AI still comes with challenges. Results can feel like a black box where tests pass or fail without clear explanations, and when something goes wrong, it's not always obvious whether the issue lies in the agent or in the setup of that test. Without that clarity, even a passing test suite is hard to fully trust.

As teams have scaled their agents across more workflows and channels, three things have become clear: building and maintaining a comprehensive test suite needed to be far less manual, results needed to be granular and interpretable enough to act on with confidence, and testing needed to live inside the deployment process rather than run alongside it. The latest Simulations are built to deliver all three.

A self-maintaining golden test set

A golden test set is a comprehensive, always-relevant suite of tests that covers every workflow and edge case. It continuously validates that your agents are performing at their best, and the stronger it is, the less regression you'll experience over time.

Building one has historically been out of reach for most teams. Writing tests that cover diverse paths is time-consuming work, and as agents evolve, those tests go stale and need to be refreshed. Maintaining a test suite at scale has required a level of ongoing effort that compounds quickly.

With the next generation of Simulations, that changes. Using Duet, teams can auto-generate tests directly from AOPs and production conversations, removing the manual work of building test cases from scratch. And when agent changes make existing tests outdated, the platform detects it automatically and updates or removes them. The result is a test suite that stays current without requiring teams to maintain it manually.

“We built 50+ Simulations with Duet in minutes to QA a brand new AOP — happy paths, escalation paths, edge cases, all of it. It flagged exactly what was broken and why, and we went from testing to production in hours instead of days. I've never felt that confident in an AOP at first launch. It's a game-changer for how fast we can build and ship.”
─ Julia Alexander, SVP of Customer Experience, Spot & Tango

Control and transparency on results

Rigid, checklist-based testing has a well-known limitation: it can satisfy every criterion and still miss a fundamentally poor interaction. But testing that swings too far in the other direction, using AI broadly without clear structure, creates a different problem: results that are directionally useful but hard to interpret or act on.

Our latest Simulations are built for the space in between.

Checkpoints give each simulation a structured conversational arc, stage by stage, so teams can see exactly where in a conversation something went right or wrong, including detail about how severe each issue is. Assertions guide that evaluation without reducing it to a checklist, with flexibility built in for the dynamic nature of agent behavior. Simulations evaluate agent behavior across multiple dimensions from whether the test was set up correctly, all the way to how the agent actually performs in the conversation and whether the interaction feels natural across every channel. 

Together, these capabilities give teams results they can trust and act on with confidence.

Production-grade testing

Testing has largely been a manual pre-deployment step, separate from the deployment process itself. That gap between catching issues and acting on them slows down the teams building agents and creates unnecessary risk as deployments scale.

Simulations now integrate natively with CI/CD pipelines. Every agent version is tested before it merges, from branch to staging to production, and the simulation suite manages its own test lifecycle automatically alongside agent versioning. It behaves like a proper software test suite, not a separate QA layer that has to be maintained in parallel.

Testing that evolves with your agents

The goal of Simulations has always been to give teams confidence in their agents. The next generation raises the bar on what that confidence can look like: a golden test suite that builds itself, results transparent enough to act on immediately, and testing woven into every stage of the deployment process.

If you’d like a deeper look into Simulations and how they work with the rest of the platform, get a demo now.

Deliver the concierge experiences your customers deserve

Get a demo