🎤 Watch the full replay of Decagon Dialogues 2025.
Learn more

Blog

Always-on agents for the holiday season and beyond

January 10, 2026

Written by Hao Liu

Every year, the holiday season pushes customer support systems to their limits. Traffic surges unpredictably, customers expect instant responses, and downtime becomes especially costly. At Decagon, we treat this period as the ultimate stress test for our AI agents.

This post walks through how our platform stayed fast, reliable, and resilient from Black Friday through the entire holiday season by focusing on three pillars:

  1. A scalable, multi-tenant architecture built for consistent performance
  2. Resilient disaster recovery systems with multiple layers of redundancy
  3. Operational best practices that eliminate risk during critical periods

Architecture built for predictable, consistent performance

Our engineering team is relentlessly focused on ensuring consistent agent performance, even when traffic surges 10x from one minute to the next. We follow several key principles when designing our architecture.

Elastic capacity that scales within seconds

Our infrastructure footprint spans multiple cloud regions to minimize latency and strengthen redundancy. The system continuously monitors CPU, memory, and concurrent requests, using both real-time signals and historical seasonality to proactively scale capacity. Autoscaling completes within seconds, keeping tail latency stable even during sharp spikes in demand.

Isolation of real-time, customer-facing tasks from everything else

Not all workloads require immediate processing. Our analytics suite (Insights, Watchtower, Ask AI, and more) run asynchronously via an elastic queue system backed by independent job clusters. This separation ensures that heavy offline workloads never compete with real-time traffic for CPU, memory, or I/O. Even during peak hours, the agent stays fast and responsive when interfacing with customers.

Tiered storage for predictable low latency

Agent metadata, configs, and other frequently accessed state sit on low-latency datastores and caches to ensure consistent response times. Colder or read-heavy datasets, including historical conversations and analytics workloads, live on scalable storage services that are optimized for throughput rather than latency. We aggressively tune TTLs, cache keys, compaction, and replication to keep p95 and p99 latencies flat, even when the platform handles multiples of normal traffic.

Throttling that guarantees fairness and resilience at every layer

Decagon supports both dedicated and shared deployments, but most customers run in a multi-tenant environment where fairness and isolation are critical. To ensure that no customer is impacted by another’s traffic burst, every request flows through per-tenant throttling. Datastores enforce rate limits to prevent resource starvation, and queue workers apply backpressure when downstream systems approach saturation.

These same controls also act as a defensive layer. The system continuously monitors for DDoS attempts, malformed traffic, or abnormal spikes. When necessary, it automatically sheds non-critical load first to preserve real-time call performance and keep agents responsive even under extreme or adversarial conditions.

Redundancy and disaster recovery designed for the worst case

We assume every dependency can degrade or fail at some point, so we engineer our systems to ensure customers never have to feel it.

Instant failover across multiple LLM and voice providers

AI developers know that major LLM providers can experience latency spikes or outages without warning. To protect our customers, Decagon partners with multiple leading frontier labs while also hosting in-house models on GPU clusters. Our system continuously evaluates response times, error rates, and saturation, automatically routing traffic away from the primary services when they degrade.

Defense in depth across the entire stack

Each subsystem, from API gateways to vector stores to Kafka clusters, can fail independently without taking the platform down. Components recover gracefully without manual intervention, and critical paths have redundant instances by default. During the holiday season, this architecture allowed our voice agents to stay universally available even as major infrastructure partners experienced incidents.

Inherited SLAs from best-in-class infrastructure providers

Our platform relies on partners with proven uptime SLAs and robust disaster-recovery frameworks, such as GCP and Twilio. Their reliability practices directly support the stability of Decagon’s core infrastructure, allowing us to deliver a dependable experience even during periods of peak demand.

Operational discipline that eliminates risk

The entire holiday season is a critical time for our enterprise customers, so reliability is key to supporting their business outcomes. Our engineering discipline has to match the urgency and volume of end-user requests they face.

Proactive capacity planning

Well before the holidays, we modeled projected traffic, replayed peak-hour workloads, and identified potential bottlenecks across compute, storage, and network layers. For components where autoscaling alone wasn’t sufficient, we intentionally over-provisioned capacity to guarantee meaningful headroom. This preparation eliminated uncertainty and ensured their agents could absorb sudden demand spikes without degraded performance.

A full code freeze during peak season

We implemented a full freeze on deploys, configuration changes, and experiments during key time windows. This removed the risk of regressions or unexpected behavior during the highest-stakes weeks of the year. Enterprise customers saw this as a signal of operational maturity, trusting that the platform would remain stable and predictable when they needed it most.

24x7 monitoring across all regions

Our observability stack was fully audited ahead of time to ensure clear, actionable signals, with every alert threshold, dashboard panel, and runbook reviewed by senior engineers. Throughout the holiday season, our global on-call rotation monitored the platform continuously to minimize response time. Ultimately, the system ran smoothly: no major failures, no unexpected anomalies, and no interruptions for customers.

Reliability as a key engineering tenet

Reliability and high availability are our top priority. As more businesses rely on Decagon agents for mission-critical workflows, new challenges will emerge. Our engineering team continues to refine autoscaling, failover logic, observability, and operational playbooks to stay ahead of customer needs.

Customers expect their agents to be available 24x7, especially when demand is highest. After this holiday season, we’re proud to say they were.

Blog

Always-on agents for the holiday season and beyond

Every year, the holiday season pushes customer support systems to their limits. At Decagon, we treat this period as the ultimate stress test for our AI agents.

Every year, the holiday season pushes customer support systems to their limits. Traffic surges unpredictably, customers expect instant responses, and downtime becomes especially costly. At Decagon, we treat this period as the ultimate stress test for our AI agents.

This post walks through how our platform stayed fast, reliable, and resilient from Black Friday through the entire holiday season by focusing on three pillars:

  1. A scalable, multi-tenant architecture built for consistent performance
  2. Resilient disaster recovery systems with multiple layers of redundancy
  3. Operational best practices that eliminate risk during critical periods

Architecture built for predictable, consistent performance

Our engineering team is relentlessly focused on ensuring consistent agent performance, even when traffic surges 10x from one minute to the next. We follow several key principles when designing our architecture.

Elastic capacity that scales within seconds

Our infrastructure footprint spans multiple cloud regions to minimize latency and strengthen redundancy. The system continuously monitors CPU, memory, and concurrent requests, using both real-time signals and historical seasonality to proactively scale capacity. Autoscaling completes within seconds, keeping tail latency stable even during sharp spikes in demand.

Isolation of real-time, customer-facing tasks from everything else

Not all workloads require immediate processing. Our analytics suite (Insights, Watchtower, Ask AI, and more) run asynchronously via an elastic queue system backed by independent job clusters. This separation ensures that heavy offline workloads never compete with real-time traffic for CPU, memory, or I/O. Even during peak hours, the agent stays fast and responsive when interfacing with customers.

Tiered storage for predictable low latency

Agent metadata, configs, and other frequently accessed state sit on low-latency datastores and caches to ensure consistent response times. Colder or read-heavy datasets, including historical conversations and analytics workloads, live on scalable storage services that are optimized for throughput rather than latency. We aggressively tune TTLs, cache keys, compaction, and replication to keep p95 and p99 latencies flat, even when the platform handles multiples of normal traffic.

Throttling that guarantees fairness and resilience at every layer

Decagon supports both dedicated and shared deployments, but most customers run in a multi-tenant environment where fairness and isolation are critical. To ensure that no customer is impacted by another’s traffic burst, every request flows through per-tenant throttling. Datastores enforce rate limits to prevent resource starvation, and queue workers apply backpressure when downstream systems approach saturation.

These same controls also act as a defensive layer. The system continuously monitors for DDoS attempts, malformed traffic, or abnormal spikes. When necessary, it automatically sheds non-critical load first to preserve real-time call performance and keep agents responsive even under extreme or adversarial conditions.

Redundancy and disaster recovery designed for the worst case

We assume every dependency can degrade or fail at some point, so we engineer our systems to ensure customers never have to feel it.

Instant failover across multiple LLM and voice providers

AI developers know that major LLM providers can experience latency spikes or outages without warning. To protect our customers, Decagon partners with multiple leading frontier labs while also hosting in-house models on GPU clusters. Our system continuously evaluates response times, error rates, and saturation, automatically routing traffic away from the primary services when they degrade.

Defense in depth across the entire stack

Each subsystem, from API gateways to vector stores to Kafka clusters, can fail independently without taking the platform down. Components recover gracefully without manual intervention, and critical paths have redundant instances by default. During the holiday season, this architecture allowed our voice agents to stay universally available even as major infrastructure partners experienced incidents.

Inherited SLAs from best-in-class infrastructure providers

Our platform relies on partners with proven uptime SLAs and robust disaster-recovery frameworks, such as GCP and Twilio. Their reliability practices directly support the stability of Decagon’s core infrastructure, allowing us to deliver a dependable experience even during periods of peak demand.

Operational discipline that eliminates risk

The entire holiday season is a critical time for our enterprise customers, so reliability is key to supporting their business outcomes. Our engineering discipline has to match the urgency and volume of end-user requests they face.

Proactive capacity planning

Well before the holidays, we modeled projected traffic, replayed peak-hour workloads, and identified potential bottlenecks across compute, storage, and network layers. For components where autoscaling alone wasn’t sufficient, we intentionally over-provisioned capacity to guarantee meaningful headroom. This preparation eliminated uncertainty and ensured their agents could absorb sudden demand spikes without degraded performance.

A full code freeze during peak season

We implemented a full freeze on deploys, configuration changes, and experiments during key time windows. This removed the risk of regressions or unexpected behavior during the highest-stakes weeks of the year. Enterprise customers saw this as a signal of operational maturity, trusting that the platform would remain stable and predictable when they needed it most.

24x7 monitoring across all regions

Our observability stack was fully audited ahead of time to ensure clear, actionable signals, with every alert threshold, dashboard panel, and runbook reviewed by senior engineers. Throughout the holiday season, our global on-call rotation monitored the platform continuously to minimize response time. Ultimately, the system ran smoothly: no major failures, no unexpected anomalies, and no interruptions for customers.

Reliability as a key engineering tenet

Reliability and high availability are our top priority. As more businesses rely on Decagon agents for mission-critical workflows, new challenges will emerge. Our engineering team continues to refine autoscaling, failover logic, observability, and operational playbooks to stay ahead of customer needs.

Customers expect their agents to be available 24x7, especially when demand is highest. After this holiday season, we’re proud to say they were.

Blog

Resources
/
Always-on agents for the holiday season and beyond

Always-on agents for the holiday season and beyond

January 10, 2026

Every year, the holiday season pushes customer support systems to their limits. Traffic surges unpredictably, customers expect instant responses, and downtime becomes especially costly. At Decagon, we treat this period as the ultimate stress test for our AI agents.

This post walks through how our platform stayed fast, reliable, and resilient from Black Friday through the entire holiday season by focusing on three pillars:

  1. A scalable, multi-tenant architecture built for consistent performance
  2. Resilient disaster recovery systems with multiple layers of redundancy
  3. Operational best practices that eliminate risk during critical periods

Architecture built for predictable, consistent performance

Our engineering team is relentlessly focused on ensuring consistent agent performance, even when traffic surges 10x from one minute to the next. We follow several key principles when designing our architecture.

Elastic capacity that scales within seconds

Our infrastructure footprint spans multiple cloud regions to minimize latency and strengthen redundancy. The system continuously monitors CPU, memory, and concurrent requests, using both real-time signals and historical seasonality to proactively scale capacity. Autoscaling completes within seconds, keeping tail latency stable even during sharp spikes in demand.

Isolation of real-time, customer-facing tasks from everything else

Not all workloads require immediate processing. Our analytics suite (Insights, Watchtower, Ask AI, and more) run asynchronously via an elastic queue system backed by independent job clusters. This separation ensures that heavy offline workloads never compete with real-time traffic for CPU, memory, or I/O. Even during peak hours, the agent stays fast and responsive when interfacing with customers.

Tiered storage for predictable low latency

Agent metadata, configs, and other frequently accessed state sit on low-latency datastores and caches to ensure consistent response times. Colder or read-heavy datasets, including historical conversations and analytics workloads, live on scalable storage services that are optimized for throughput rather than latency. We aggressively tune TTLs, cache keys, compaction, and replication to keep p95 and p99 latencies flat, even when the platform handles multiples of normal traffic.

Throttling that guarantees fairness and resilience at every layer

Decagon supports both dedicated and shared deployments, but most customers run in a multi-tenant environment where fairness and isolation are critical. To ensure that no customer is impacted by another’s traffic burst, every request flows through per-tenant throttling. Datastores enforce rate limits to prevent resource starvation, and queue workers apply backpressure when downstream systems approach saturation.

These same controls also act as a defensive layer. The system continuously monitors for DDoS attempts, malformed traffic, or abnormal spikes. When necessary, it automatically sheds non-critical load first to preserve real-time call performance and keep agents responsive even under extreme or adversarial conditions.

Redundancy and disaster recovery designed for the worst case

We assume every dependency can degrade or fail at some point, so we engineer our systems to ensure customers never have to feel it.

Instant failover across multiple LLM and voice providers

AI developers know that major LLM providers can experience latency spikes or outages without warning. To protect our customers, Decagon partners with multiple leading frontier labs while also hosting in-house models on GPU clusters. Our system continuously evaluates response times, error rates, and saturation, automatically routing traffic away from the primary services when they degrade.

Defense in depth across the entire stack

Each subsystem, from API gateways to vector stores to Kafka clusters, can fail independently without taking the platform down. Components recover gracefully without manual intervention, and critical paths have redundant instances by default. During the holiday season, this architecture allowed our voice agents to stay universally available even as major infrastructure partners experienced incidents.

Inherited SLAs from best-in-class infrastructure providers

Our platform relies on partners with proven uptime SLAs and robust disaster-recovery frameworks, such as GCP and Twilio. Their reliability practices directly support the stability of Decagon’s core infrastructure, allowing us to deliver a dependable experience even during periods of peak demand.

Operational discipline that eliminates risk

The entire holiday season is a critical time for our enterprise customers, so reliability is key to supporting their business outcomes. Our engineering discipline has to match the urgency and volume of end-user requests they face.

Proactive capacity planning

Well before the holidays, we modeled projected traffic, replayed peak-hour workloads, and identified potential bottlenecks across compute, storage, and network layers. For components where autoscaling alone wasn’t sufficient, we intentionally over-provisioned capacity to guarantee meaningful headroom. This preparation eliminated uncertainty and ensured their agents could absorb sudden demand spikes without degraded performance.

A full code freeze during peak season

We implemented a full freeze on deploys, configuration changes, and experiments during key time windows. This removed the risk of regressions or unexpected behavior during the highest-stakes weeks of the year. Enterprise customers saw this as a signal of operational maturity, trusting that the platform would remain stable and predictable when they needed it most.

24x7 monitoring across all regions

Our observability stack was fully audited ahead of time to ensure clear, actionable signals, with every alert threshold, dashboard panel, and runbook reviewed by senior engineers. Throughout the holiday season, our global on-call rotation monitored the platform continuously to minimize response time. Ultimately, the system ran smoothly: no major failures, no unexpected anomalies, and no interruptions for customers.

Reliability as a key engineering tenet

Reliability and high availability are our top priority. As more businesses rely on Decagon agents for mission-critical workflows, new challenges will emerge. Our engineering team continues to refine autoscaling, failover logic, observability, and operational playbooks to stay ahead of customer needs.

Customers expect their agents to be available 24x7, especially when demand is highest. After this holiday season, we’re proud to say they were.

AI agents for concierge customer experience

Get a demo