Tokens
The basic units that language models use to process text — typically words, word pieces, or characters that the model reads and generates.
Tokens are the fundamental units that language models use to process and generate text. Rather than working with individual characters or whole words, LLMs break text into tokens — which are typically words, common word parts, or individual characters for rare words.
How tokenization works: "Hello, how are you?" becomes roughly 5 tokens: ["Hello", ",", " how", " are", " you", "?"]. "Unbelievable" might become 3 tokens: ["Un", "believ", "able"]. Numbers, punctuation, and special characters each consume tokens.
Key token facts: 1 token is approximately 4 characters or 0.75 words in English, 100 tokens is approximately 75 words, a typical page of text is roughly 300-400 tokens, other languages often require more tokens per word (Chinese, Japanese, Korean), and code tends to be token-heavy due to syntax characters.
Tokens matter for AI agent builders because: pricing is per token (input tokens + output tokens), context windows have token limits (not word limits), response length is measured in tokens, and knowledge base chunks are sized by tokens.
Token economics: each conversation turn consumes tokens for the system prompt, conversation history, retrieved knowledge, user message, and AI response. A typical AI agent conversation of 10 turns might consume 5,000-20,000 tokens, depending on system prompt length, knowledge retrieval, and response verbosity.
Builders can manage token costs by: keeping system prompts concise, retrieving relevant (not excessive) knowledge, summarizing long conversation histories, and choosing appropriate model sizes for different tasks.
Related Terms
Context Window
FundamentalsThe maximum amount of text (measured in tokens) that a language model can process in a single interaction, including both input and output.
Large Language Model (LLM)
FundamentalsA neural network trained on massive text datasets that can understand and generate human-like language, powering modern AI assistants and agents.
Token Optimization
ArchitectureStrategies and techniques for reducing the number of tokens consumed when interacting with AI models, lowering costs and improving performance.
Inference
InfrastructureThe process of using a trained AI model to generate predictions, answers, or content based on new input data.
Build AI Agents Without Code
Turn these AI concepts into real products. Build custom AI agents on Chipp and deploy them in minutes.
Start Building Free