What is Token Optimization?

Why It Matters

LLM APIs charge per token (both input and output). A single Claude Opus 4 request with 100k context can cost several dollars. Optimizing token usage can reduce costs by 10-100x.

Common Strategies

1. Targeted Retrieval (vs. Context Stuffing)

Instead of loading entire documents into context, use semantic search to retrieve only relevant snippets.

Tools: QMD, RAG pipelines, vector databases

2. Prompt Compression

Remove unnecessary words, whitespace, and formatting from prompts without losing meaning.

3. Caching

Cache LLM responses for repeated queries. Many providers offer prompt caching (reduced cost for repeated context).

4. Model Selection

Use smaller/cheaper models for routine tasks, expensive models for complex reasoning.

Routine: Sonnet, Kimi K2, local models
Complex: Opus, o1-preview

5. Output Length Control

Explicitly request concise responses when brevity suffices.

Resources

Anthropic Prompt Caching

Token Optimization