Introducing Duet Autopilot.
Learn more
Glossary

Wake word

A wake word (also called a trigger word or hotword) is a short, specific phrase that activates a voice-enabled device or application. "Hey Siri," "Alexa," and "OK Google" are the most recognized examples. The device listens continuously for the wake word using a compact, always-on model, then activates the full voice processing pipeline — including automatic speech recognition and natural language understanding — only after the wake word is detected. This two-stage architecture conserves compute resources and limits unintended activations.

The requirement that drives wake word design is deceptively demanding: the system must run continuously, often on battery-powered hardware, while achieving near-zero false rejections (failing to activate when the user says the word) and near-zero false accepts (activating on words that merely sound similar). A home speaker that triggers on "electra" or "placebo" when "Alexa" was never spoken will quickly lose user trust.

How wake word detection works

Wake word detectors are small discriminative classifiers — typically convolutional neural networks or recurrent neural networks trained to distinguish a specific acoustic pattern from background audio. Unlike a full speech recognizer, the model does not need to understand arbitrary speech; it only needs to distinguish the wake word from all other sounds. This narrow scope allows the model to be small enough (often under 1 MB) to run continuously on a dedicated low-power processor without activating the main application CPU.

Training requires two classes of audio: positive examples (recordings of the wake word from diverse speakers, accents, microphone distances, and acoustic environments) and negative examples (naturally occurring audio that is acoustically similar but not the wake word — "electra," "relax," background speech, music, TV audio). The false accept rate and false reject rate are tuned by adjusting the classification threshold: a lower threshold catches more true wake words but also more false accepts; a higher threshold reduces false accepts at the cost of more missed detections.

False accepts, false rejects, and the tradeoff

The operating point on the false accept/false reject curve is a design choice driven by use case. A hands-free device in a noisy factory environment prioritizes low false reject rate — workers cannot repeat the wake word multiple times. A smart speaker in a shared living space prioritizes low false accept rate — random activations are intrusive. Consumer products typically publish false accept rates in units of false accepts per hour of background audio, though measurement conditions vary widely across vendors, making direct comparisons unreliable.

Custom wake words — phrases chosen by enterprises or product teams rather than the device manufacturer — introduce additional challenges. A unique phrase with distinctive phonetics and multiple syllables (e.g., "Hey Aria") is easier to distinguish from ambient speech than a short, common phrase (e.g., "Go"). Wake word selection is therefore part of the design process, not a post-hoc configuration choice.

Wake words in enterprise voice AI

In enterprise AI voice agent deployments, wake word detection is less prominent than in consumer devices because enterprise interactions are typically initiated through a phone call, a dedicated button, or a software UI — not continuous ambient listening. However, in physical retail, warehouse, or healthcare environments where hands-free operation matters, custom wake words replace the push-to-talk model. In these contexts, wake word accuracy directly affects downstream transcription quality: a false accept that triggers the pipeline mid-sentence of background speech will produce a meaningless ASR transcript. Pairing a well-tuned wake word model with Word Error Rate monitoring on the triggered audio segments is the standard approach for measuring combined first-stage + ASR accuracy in production.

Deliver the concierge experiences your customers deserve

Get a demo