Understanding Tokens
Learn exactly what tokens are, how they're counted, and why they matter for AI pricing
If you’ve used AI tools, you’ve probably heard about “tokens.” But what exactly are they? This guide explains tokens in depth, with interactive tools to help you understand how your text gets tokenized.
Try It Yourself
Use this tool to see exactly how your text gets broken into tokens:
Token Visualizer
What Is a Token?
A token is the basic unit that AI models use to process text. When you send a message to an AI, it doesn’t read characters or words—it reads tokens.
Think of tokens as the “vocabulary” the AI uses. Just like how humans break down sentences into words, AI models break down text into tokens. But tokens don’t always match our intuition about words.
Tokens Are Not Words
Here’s where it gets interesting:
| Text | Tokens | Count |
|---|---|---|
| ”hello” | hello | 1 |
| ”Hello” | Hello | 1 |
| ”HELLO” | HE, LLO | 2 |
| ”tokenization” | token, ization | 2 |
| ”ChatGPT” | Chat, G, PT | 3 |
Notice how:
- Common words are often 1 token
- Capitalization can change tokenization
- Long or uncommon words get split into pieces
- Technical terms may become multiple tokens
The 4-Character Rule (and Why It’s Wrong)
You’ll often hear “1 token ≈ 4 characters” as a quick estimate. This is a useful approximation, but it’s not accurate:
| Text | Characters | Actual Tokens | 4-Char Estimate |
|---|---|---|---|
| ”The quick brown fox” | 19 | 4 | 5 |
| ”supercalifragilisticexpialidocious” | 34 | 9 | 9 |
| ”🎉🎊🎁“ | 3 | 6 | 1 |
| ”日本語” | 3 | 3 | 1 |
console.log("hello") | 21 | 7 | 5 |
The approximation works reasonably well for plain English text, but breaks down with:
- Emojis: Each emoji can be 2-3 tokens
- Non-English languages: Often more tokens per character
- Code: Varies widely based on syntax and naming
- Numbers: Can be tokenized digit-by-digit
Input vs Output Tokens
When you interact with AI, there are two types of tokens:
Input Tokens (Prompt Tokens)
Everything you send to the AI:
- Your message
- System instructions (set by the app builder)
- Conversation history
- Any context or documents
Output Tokens (Completion Tokens)
Everything the AI generates:
- The response text
- Any formatted content (markdown, code, etc.)
Why This Matters for Pricing
Output tokens typically cost 2-4x more than input tokens. Why?
- Input: The model reads your tokens (relatively fast)
- Output: The model generates tokens one-by-one (computationally intensive)
This is why shorter, more concise AI responses cost less than long, verbose ones.
How Tokenization Works
AI models use a technique called Byte Pair Encoding (BPE) to create their vocabulary. Here’s a simplified explanation:
- Start with characters: Begin with individual characters as the base vocabulary
- Find common pairs: Look for character pairs that appear frequently together
- Merge pairs: Combine the most common pairs into single tokens
- Repeat: Continue merging until you have a vocabulary of ~100,000 tokens
The result is a vocabulary where:
- Very common words (“the”, “is”, “and”) are single tokens
- Common word parts (“-ing”, “-tion”, “un-”) are single tokens
- Rare words get split into known pieces
Different Models, Different Tokenizers
Not all AI models tokenize text the same way:
| Model Family | Tokenizer | Vocabulary Size |
|---|---|---|
| GPT-4.1, GPT-5.x | o200k_base | ~200,000 |
| Claude 4.x | Claude tokenizer | ~100,000 |
| Gemini 2.5/3 | SentencePiece | ~256,000 |
This means the same text may have different token counts across models, though the differences are usually small for English text.
Practical Tips for Managing Tokens
1. Be Concise in Your Instructions
Shorter system prompts = fewer input tokens on every message.
Verbose (45 tokens):
“I would like you to act as a helpful assistant that can answer questions about various topics. Please provide detailed and informative responses to any questions that users might ask you.”
Concise (15 tokens):
“You’re a helpful assistant. Provide detailed, informative answers.”
2. Consider Response Length
If you don’t need long responses, tell the AI:
- “Answer in 2-3 sentences”
- “Be brief”
- “Summarize in one paragraph”
3. Watch Conversation Length
Every message in your conversation history counts as input tokens. Longer conversations = more tokens.
4. Code and Technical Content
Code often tokenizes efficiently because:
- Common keywords (
function,return,if) are single tokens - But variable names and strings add up quickly
Token Limits (Context Windows)
Every AI model has a maximum number of tokens it can process at once—this is called the context window. Models also have a max output limit on how many tokens they can generate in a single response.
| Model | Context Window | Max Output |
|---|---|---|
| GPT-5.4 | 1,000,000 tokens | 32,768 tokens |
| GPT-5.4 Mini | 1,000,000 tokens | 32,768 tokens |
| GPT-4.1 | 1,000,000 tokens | 32,768 tokens |
| Claude Opus 4.6 | 1,000,000 tokens | 128,000 tokens |
| Claude Sonnet 4.6 | 1,000,000 tokens | 64,000 tokens |
| Claude Haiku 4.5 | 200,000 tokens | 64,000 tokens |
| Gemini 2.5 Pro | 1,000,000 tokens | 65,536 tokens |
| Gemini 2.5 Flash | 1,000,000 tokens | 65,536 tokens |
| o3 | 200,000 tokens | 65,536 tokens |
| o4-mini | 200,000 tokens | 65,536 tokens |
The context window must fit:
- System prompt
- Conversation history
- Your current message
- Space for the response
If you exceed the limit, older messages get truncated from the conversation.
Model Pricing Comparison
Prices are per 1 million tokens (including Chipp platform markup):
| Model | Input $/M | Output $/M | Best For |
|---|---|---|---|
| GPT-5.4 | $3.25 | $19.50 | General purpose, latest capabilities |
| GPT-5.4 Mini | $0.98 | $5.85 | Fast, cost-effective |
| GPT-5.4 Nano | $0.26 | $1.63 | Ultra-fast, simple tasks |
| GPT-4.1 | $2.60 | $10.40 | Long context, reliable coding |
| Claude Opus 4.6 | $6.50 | $32.50 | Deep reasoning, complex tasks |
| Claude Sonnet 4.6 | $3.90 | $19.50 | Writing, analysis, balanced |
| Claude Haiku 4.5 | $1.30 | $6.50 | Fast, affordable |
| Gemini 2.5 Pro | $1.63 | $6.50 | Long context, multimodal |
| Gemini 2.5 Flash | $0.10 | $0.39 | Fast, very affordable |
| Gemini 2.5 Flash Lite | $0.05 | $0.20 | Ultra-cheap, simple tasks |
| o3 | $13.00 | $52.00 | Advanced reasoning |
| o4-mini | $3.90 | $15.60 | Reasoning on a budget |
Cost Examples
Let’s look at real token costs using GPT-5.4 (19.50/M output):
Example 1: Quick Q&A
- Your question: 50 tokens ($0.000163)
- AI response: 100 tokens ($0.00195)
- Total: $0.002 (less than 1 cent)
Example 2: Long Document Analysis
- Document + question: 5,000 tokens ($0.01625)
- Detailed response: 1,000 tokens ($0.0195)
- Total: $0.04 (4 cents)
Example 3: Extended Conversation
- 20 back-and-forth messages
- Average input per turn: 2,000 tokens (includes history)
- Average output: 300 tokens
- Total: ~$0.25 (25 cents)
Budget-Friendly Alternative (Gemini 2.5 Flash)
The same Extended Conversation example with Gemini 2.5 Flash (0.39/M output):
- Total: ~$0.006 (less than 1 cent)