



The next generation of Simulations: testing that keeps pace with your agents
June 30, 2026
Today, we're introducing the next generation of Simulations, Decagon's testing suite for AI agents.
Most AI platforms include a testing solution, but testing powered by AI still comes with challenges. Results can feel like a black box where tests pass or fail without clear explanations, and when something goes wrong, it's not always obvious whether the issue lies in the agent or in the setup of that test. Without that clarity, even a passing test suite is hard to fully trust.
As teams have scaled their agents across more workflows and channels, three things have become clear: building and maintaining a comprehensive test suite needed to be far less manual, results needed to be granular and interpretable enough to act on with confidence, and testing needed to live inside the deployment process rather than run alongside it. The latest Simulations are built to deliver all three.
A self-maintaining golden test set
A golden test set is a comprehensive, always-relevant suite of tests that covers every workflow and edge case. It continuously validates that your agents are performing at their best, and the stronger it is, the less regression you'll experience over time.
Building one has historically been out of reach for most teams. Writing tests that cover diverse paths is time-consuming work, and as agents evolve, those tests go stale and need to be refreshed. Maintaining a test suite at scale has required a level of ongoing effort that compounds quickly.
With the next generation of Simulations, that changes. Using Duet, teams can auto-generate tests directly from AOPs and production conversations, removing the manual work of building test cases from scratch. And when agent changes make existing tests outdated, the platform detects it automatically and updates or removes them. The result is a test suite that stays current without requiring teams to maintain it manually.
Control and transparency on results
Rigid, checklist-based testing has a well-known limitation: it can satisfy every criterion and still miss a fundamentally poor interaction. But testing that swings too far in the other direction, using AI broadly without clear structure, creates a different problem: results that are directionally useful but hard to interpret or act on.
Our latest Simulations are built for the space in between.
Checkpoints give each simulation a structured conversational arc, stage by stage, so teams can see exactly where in a conversation something went right or wrong, including detail about how severe each issue is. Assertions guide that evaluation without reducing it to a checklist, with flexibility built in for the dynamic nature of agent behavior. Simulations evaluate agent behavior across multiple dimensions from whether the test was set up correctly, all the way to how the agent actually performs in the conversation and whether the interaction feels natural across every channel.
Together, these capabilities give teams results they can trust and act on with confidence.
.png)
Production-grade testing
Testing has largely been a manual pre-deployment step, separate from the deployment process itself. That gap between catching issues and acting on them slows down the teams building agents and creates unnecessary risk as deployments scale.
Simulations now integrate natively with CI/CD pipelines. Every agent version is tested before it merges, from branch to staging to production, and the simulation suite manages its own test lifecycle automatically alongside agent versioning. It behaves like a proper software test suite, not a separate QA layer that has to be maintained in parallel.
Testing that evolves with your agents
The goal of Simulations has always been to give teams confidence in their agents. The next generation raises the bar on what that confidence can look like: a golden test suite that builds itself, results transparent enough to act on immediately, and testing woven into every stage of the deployment process.
If you’d like a deeper look into Simulations and how they work with the rest of the platform, get a demo now.






.png)
.png)