Understanding Tokens for AI Agents- Chipp Docs

If you've used AI tools, you've probably heard about "tokens." But what exactly are they? This guide explains tokens in depth, with interactive tools to help you understand how your text gets tokenized.

Try It Yourself

Use this tool to see exactly how your text gets broken into tokens:

Token Visualizer

See how your text gets broken into tokens

Enter text to tokenize

Characters

Tokens

4.40

Chars/Token

$0.000065

Input Cost (GPT-5)

Tokenized Output (10 tokens)

The␣quick␣brown␣fox␣jumps␣over␣the␣lazy␣dog.

Try these examples

Using the cl100k_base tokenizer (GPT-4/GPT-5). Hover over tokens to see their IDs. Other models (Claude, Gemini) use different tokenizers with similar results.

What Is a Token?

A token is the basic unit that AI models use to process text. When you send a message to an AI, it doesn't read characters or words—it reads tokens.

Think of tokens as the "vocabulary" the AI uses. Just like how humans break down sentences into words, AI models break down text into tokens. But tokens don't always match our intuition about words.

Tokens Are Not Words

Here's where it gets interesting:

Text	Tokens	Count
"hello"	`hello`	1
"Hello"	`Hello`	1
"HELLO"	`HE`, `LLO`	2
"tokenization"	`token`, `ization`	2
"ChatGPT"	`Chat`, `G`, `PT`	3

Notice how:

Common words are often 1 token
Capitalization can change tokenization
Long or uncommon words get split into pieces
Technical terms may become multiple tokens

The 4-Character Rule (and Why It's Wrong)

You'll often hear "1 token ≈ 4 characters" as a quick estimate. This is a useful approximation, but it's not accurate:

Text	Characters	Actual Tokens	4-Char Estimate
"The quick brown fox"	19	4	5
"supercalifragilisticexpialidocious"	34	9	9
"🎉🎊🎁"	3	6	1
"日本語"	3	3	1
`console.log("hello")`	21	7	5

The approximation works reasonably well for plain English text, but breaks down with:

Emojis: Each emoji can be 2-3 tokens
Non-English languages: Often more tokens per character
Code: Varies widely based on syntax and naming
Numbers: Can be tokenized digit-by-digit

Input vs Output Tokens

When you interact with AI, there are two types of tokens:

Input Tokens (Prompt Tokens)

Everything you send to the AI:

Your message
System instructions (set by the app builder)
Conversation history
Any context or documents

Output Tokens (Completion Tokens)

Everything the AI generates:

The response text
Any formatted content (markdown, code, etc.)

Why This Matters for Pricing

Output tokens typically cost 2-4x more than input tokens. Why?

Input: The model reads your tokens (relatively fast)
Output: The model generates tokens one-by-one (computationally intensive)

This is why shorter, more concise AI responses cost less than long, verbose ones.

How Tokenization Works

AI models use a technique called Byte Pair Encoding (BPE) to create their vocabulary. Here's a simplified explanation:

Start with characters: Begin with individual characters as the base vocabulary
Find common pairs: Look for character pairs that appear frequently together
Merge pairs: Combine the most common pairs into single tokens
Repeat: Continue merging until you have a vocabulary of ~100,000 tokens

The result is a vocabulary where:

Very common words ("the", "is", "and") are single tokens
Common word parts ("-ing", "-tion", "un-") are single tokens
Rare words get split into known pieces

Different Models, Different Tokenizers

Not all AI models tokenize text the same way:

Model Family	Tokenizer	Vocabulary Size
GPT-4.1, GPT-5, GPT-5.2	o200k_base	~200,000
Claude (Opus 4.6, Sonnet 4.6, Haiku 4.5)	Claude tokenizer	~100,000
Gemini (2.5, 3)	SentencePiece	~256,000

This means the same text may have different token counts across models, though the differences are usually small for English text.

Practical Tips for Managing Tokens

1. Be Concise in Your Instructions

Shorter system prompts = fewer input tokens on every message.

Verbose (45 tokens):

"I would like you to act as a helpful assistant that can answer questions about various topics. Please provide detailed and informative responses to any questions that users might ask you."

Concise (15 tokens):

"You're a helpful assistant. Provide detailed, informative answers."

2. Consider Response Length

If you don't need long responses, tell the AI:

"Answer in 2-3 sentences"
"Be brief"
"Summarize in one paragraph"

3. Watch Conversation Length

Every message in your conversation history counts as input tokens. Longer conversations = more tokens.

4. Code and Technical Content

Code often tokenizes efficiently because:

Common keywords (function, return, if) are single tokens
But variable names and strings add up quickly

Token Limits (Context Windows)

Every AI model has a maximum number of tokens it can process at once—this is called the context window.

Model	Context Window	Max Output
GPT-5.2	400,000 tokens	32,000 tokens
GPT-5 mini	400,000 tokens	32,000 tokens
GPT-4.1	1,000,000 tokens	32,000 tokens
Claude Opus 4.6	200,000 tokens (1M beta)	128,000 tokens
Claude Sonnet 4.6	200,000 tokens (1M beta)	64,000 tokens
Claude Haiku 4.5	200,000 tokens	64,000 tokens
Gemini 3 Pro	1,000,000 tokens	64,000 tokens
Gemini 2.5 Pro	1,000,000 tokens	64,000 tokens
Gemini 2.5 Flash	1,000,000 tokens	64,000 tokens

The context window must fit:

System prompt
Conversation history
Your current message
Space for the response

If you exceed the limit, older messages get truncated from the conversation.

Model Pricing Comparison

Here's how the major models compare on price (per 1M tokens):

Model	Input	Output	Best For
GPT-5.2	$1.75	$14.00	Complex coding, agentic tasks
GPT-5 mini	$0.25	$2.00	Fast, affordable general tasks
GPT-4.1	$3.00	$12.00	Fine-tuning, long context
GPT-4.1 mini	$0.80	$3.20	Budget-friendly fine-tuning
GPT-4.1 nano	$0.20	$0.80	Ultra-cheap simple tasks
Claude Opus 4.6	$5.00	$25.00	Most intelligent, agents & coding
Claude Sonnet 4.6	$3.00	$15.00	Best speed/intelligence balance
Claude Haiku 4.5	$1.00	$5.00	Fastest near-frontier model
Gemini 3 Pro	$2.00	$12.00	Multimodal, long context
Gemini 2.5 Pro	$1.25	$10.00	Strong reasoning, 1M context
Gemini 2.5 Flash	$0.30	$2.50	Fast and affordable
Gemini 2.5 Flash Lite	$0.10	$0.40	Cheapest option available

Notice the pattern: output tokens always cost more than input—typically 4-8x more. That's because generating text is computationally harder than reading it.

Cost Examples

Let's look at real token costs across different price points:

Example 1: Quick Q&A (GPT-5 mini — cheapest mainstream)

Your question: 50 tokens ($0.0000125)
AI response: 100 tokens ($0.0002)
Total: $0.0002 (⅕ of a cent)

Example 2: Long Document Analysis (Claude Sonnet 4.6 — balanced)

Document + question: 5,000 tokens ($0.015)
Detailed response: 1,000 tokens ($0.015)
Total: $0.03 (3 cents)

Example 3: Extended Conversation (GPT-5.2 — flagship)

20 back-and-forth messages
Average input per turn: 2,000 tokens (includes history)
Average output: 300 tokens
Total: ~$0.15 (15 cents)

Example 4: The Budget Option (Gemini 2.5 Flash Lite)

Same extended conversation as above
Total: ~$0.006 (less than 1 cent)

The takeaway: model choice has a massive impact on cost. A conversation that costs 15 cents on a flagship model costs less than a penny on a budget one.

Understanding Tokens