What are Tokens in AI? Token Limits Explained

What are tokens?

Tokens are the basic units that language models use to process text. Rather than reading character by character or word by word, LLMs break text into tokens—chunks that balance efficiency and meaning.

A token might be:

A whole word: "hello" → 1 token
Part of a word: "tokenization" → "token" + "ization" → 2 tokens
Punctuation: "!" → 1 token
A space: " " → often combined with the next word

Rule of thumb for English:

1 token ≈ 4 characters
1 token ≈ 0.75 words
100 tokens ≈ 75 words

This varies by language. Chinese, Japanese, and Korean typically use more tokens per character than English.

How does tokenization work?

Tokenization algorithms break text into tokens using learned patterns. The most common approach is Byte Pair Encoding (BPE):

Start with individual characters as tokens
Find the most frequent pair of tokens
Merge that pair into a new token
Repeat until reaching a target vocabulary size

This creates a vocabulary where:

Common words are single tokens: "the", "is", "and"
Less common words are split: "tokenization" → "token" + "ization"
Rare words might be character-level: "qxyz" → "q" + "x" + "y" + "z"

Different models use different tokenizers:

GPT-4 uses cl100k_base (~100K token vocabulary)
Claude uses its own tokenizer
Llama uses SentencePiece

Important: You must use the same tokenizer the model uses. A text that's 100 tokens in GPT-4's tokenizer might be 110 tokens in another.

Why do tokens matter?

Pricing Most AI APIs charge per token:

Input tokens (your prompt)
Output tokens (the response)

GPT-4 might cost $0.01 per 1K input tokens and $0.03 per 1K output tokens. A 10K token prompt with a 2K token response costs $0.16.

Context limits Every model has a maximum context window measured in tokens:

GPT-3.5: 4K or 16K tokens
GPT-4: 8K or 128K tokens
Claude 3: 200K tokens
Gemini 1.5: 1M+ tokens

Your prompt + desired response must fit within this limit.

Processing speed More tokens = longer processing time. A 100K token input takes longer than a 1K token input, even for models that support it.

Quality implications Very long contexts can degrade response quality. Models may "lose" information in the middle of long documents (the "lost in the middle" problem).

How to count tokens

Online tools

OpenAI Tokenizer: platform.openai.com/tokenizer
Anthropic Console shows token counts
Various third-party tools

Programmatically

For OpenAI models:

import tiktoken

encoder = tiktoken.encoding_for_model("gpt-4")
tokens = encoder.encode("Hello, world!")
print(len(tokens))  # Output: 4

Quick estimates

Characters ÷ 4 ≈ tokens
Words × 1.3 ≈ tokens
1 page of text ≈ 500 tokens

Accurate counting matters when:

You're near context limits
Optimizing for cost
Building production applications
Working with structured prompts

Managing token usage

Reduce input tokens:

Summarize long documents before including them
Include only relevant context, not everything
Use concise prompts—every word counts
Remove redundant instructions

Reduce output tokens:

Ask for concise responses: "Answer in 2-3 sentences"
Request specific formats: "Respond with only the answer"
Use structured output to eliminate fluff
Set max_tokens parameter to limit response length

Optimize for cost:

Use cheaper models for simple tasks
Cache common responses
Batch similar requests
Monitor usage and set alerts

Handle context limits:

Implement chunking for long documents
Use RAG to retrieve only relevant sections
Summarize conversation history for long chats
Consider models with larger context windows

Token tips and tricks

Whitespace matters " hello" (with leading space) and "hello" are different tokens. Inconsistent spacing can affect outputs.

Numbers are expensive "123456789" might be 3+ tokens, while "one hundred twenty-three million" is many more. Choose representations wisely.

Code is token-hungry Code with long variable names, comments, and whitespace uses many tokens. Minified code uses fewer but is harder to process.

JSON structure

{"name":"John","age":30}

Uses fewer tokens than:

{
  "name": "John",
  "age": 30
}

Prompt caching Some APIs (like Anthropic) cache common prompt prefixes, reducing cost for repeated prompts with the same system instructions.

Special tokens Models use special tokens for structure: <|system|>, <|user|>, etc. These count toward limits and aren't always visible to you.

Tokens

What are tokens?

How does tokenization work?

Why do tokens matter?

How to count tokens

Managing token usage

Token tips and tricks

Related Terms

Large Language Model (LLM)

Context Window

Embeddings

Large Language Model (LLM)

Context Window

Embeddings