AI tokens
An AI token is the basic unit of text that a large language model reads and produces. Before a model can process a sentence, the sentence is broken into tokens — short fragments of characters, words, or punctuation — and the model works with those fragments rather than raw text. Understanding tokens is essential for anyone working with AI, because tokens determine cost, latency, and how much context a model can hold at once.
A simple rule of thumb: in English, one token is roughly four characters, or about three-quarters of a word. The phrase Hello, how can I help you? tokenizes to about seven tokens in most modern models. Short common words like the are usually a single token; longer or rarer words get split into multiple tokens.
How tokenization works
Tokenization is the step that converts raw text into the numeric IDs a model can process. Most modern large language models use a method called byte-pair encoding (BPE) or a close variant like WordPiece or SentencePiece. These algorithms learn, from a large training corpus, which character sequences appear together often enough to be worth treating as a single token. Common words become single tokens, while rare words get split into smaller pieces — which is why a made-up name like Zylphora might cost five or six tokens, while customer costs one.
Different models use different tokenizers, so the same sentence can have different token counts in GPT, Claude, and Gemini. This matters when you're estimating cost or comparing context-window sizes across providers.
Why token counts matter
Tokens are the unit of billing and the unit of memory for AI systems. Three practical implications:
- Cost: Nearly every commercial LLM API prices per token, with separate rates for input (prompt) tokens and output (completion) tokens. A verbose system prompt or a long conversation history can quietly dominate cost.
- Latency: Generation time scales roughly linearly with output tokens. A 1,000-token answer takes about ten times as long to stream as a 100-token answer.
- Context window: Every model has a maximum number of tokens it can consider in one call — its context window. Exceeding it forces truncation or summarization.
Token limits across major models
Context windows have grown rapidly. As of 2026, GPT-4o supports a 128,000-token context, Claude 3.5 Sonnet supports 200,000, and Gemini 1.5 Pro supports up to 2,000,000 tokens for select customers. For reference, 100,000 tokens is roughly 75,000 English words — about the length of a short novel. Larger context windows let an AI agent reason over longer documents, longer conversations, and richer retrieved evidence without losing earlier detail.
How tokens connect to prompt engineering and retrieval
Token economics shape almost every design decision in a production AI system. Good prompt engineering is partly the art of conveying instructions in the fewest tokens that still get the desired behavior. Retrieval-augmented generation (RAG) exists in part because it's cheaper and more accurate to retrieve the most relevant passages from a knowledge base than to stuff every possible document into the prompt. And techniques for managing long-running conversational AI sessions — summarization, sliding windows, memory stores — exist precisely to keep token counts under control.
Tokens in customer support AI
For AI agents handling customer conversations, token usage is the hidden driver of unit economics. A single resolved ticket may involve a system prompt, a customer message, several retrieved knowledge-base passages, a chain of tool calls, and a final response — each step consuming tokens. Teams that monitor token usage per resolved conversation can spot inefficient prompts, oversized retrievals, or runaway tool loops before they show up as a surprise bill. AI observability tooling typically tracks tokens-per-conversation as a core metric.
Frequently asked questions
What is a token in AI? A token is the smallest unit of text that a language model processes. It's typically a short sequence of characters — often a word, part of a word, or a punctuation mark — that the model treats as a single symbol when reading or generating text.
How are AI tokens counted? Each provider uses its own tokenizer, but the typical English ratio is about one token per four characters, or roughly 0.75 tokens per word. Code, non-English text, and unusual symbols tend to use more tokens.
How much does an AI token cost? Costs vary widely by model and tier, ranging from fractions of a cent per thousand tokens for small open models to several cents per thousand for frontier models. Input and output tokens are usually priced separately, with output tokens costing more.
What is the token limit of a model? The token limit, or context window, is the maximum number of tokens a model can consider in one request. Frontier models in 2026 range from roughly 128,000 to over 2,000,000 tokens.
Are AI tokens the same as cryptocurrency tokens? No. AI tokens are units of text inside a language model. Crypto tokens are tradable digital assets on a blockchain. The two are unrelated despite the shared word.
For a deeper dive, download Decagon's guide to agentic AI for customer experience.

