Understanding tokens is key to getting the most out of AI models and your budget This guide breaks down what tokens are, how they work on Chipp, and practical tips for choosing the right model.
What Are AI Tokens?
Tokens are how AI models read and process text. Think of them as word chunks.
Rule of thumb: 1,000 tokens ≈ 750 words
So a 128,000 token limit means roughly 96,000 words of conversation history.
What counts as tokens:
- Your instructions (system prompt)
- Knowledge sources you've uploaded
- Conversation history (all previous messages)
- The AI's responses
- Any images or files you upload
Everything adds up.
Why Tokens Matter
Tokens determine three things:
- How much context the model remembers - Can it see your entire conversation or just the last few messages?
- How much you can upload - Can you paste a 100-page document or just a few paragraphs?
- How much it costs - More tokens = higher cost
Context Windows Explained
The context window is the maximum tokens a model can process at once.
Think of it like the model's working memory. Everything you give it—your instructions, uploaded files, conversation history—has to fit in this window.
Example:
- Model has 128,000 token context window (≈96,000 words)
- Your instructions: 500 tokens
- Uploaded PDF: 50,000 tokens
- Conversation so far: 20,000 tokens
- Remaining space: 57,500 tokens for the AI's response and future messages
When you hit the limit, older messages get dropped to make room.
Why it matters:
- Larger context = model remembers more
- Larger context = you can upload bigger documents
- Larger context = usually more expensive
How Tokens Work on Chipp
All About Usage
Chipp pays for all AI costs on your account.
That means every time you ask a question, we pay whichever model provider you choose (OpenAI, Google, etc.).
To make things easy for you, initial usage is included with your plan.
If you exceed that usage in a given month, you are charged the overage. Chipp charges the direct cost for the model you choose with a 30% service fee.
Quick Reference: $10/mo provides roughly 10 million words on the top models, more on the smaller models.
What This Means
- Pro plan: $29/month with $10 usage included
- Team plan: $99/month with $30 usage included
- Business plan: $299/month with $100 usage included
After you use your included amount, you only pay for what you use. Fast models cost less. Powerful models cost more. You're in control.
Plus, with our built in top-up and usage notifications, you will always know what's coming and only get charged for what you want.
Token Costs and Pricing
Models charge per million tokens. There are two costs:
- Input tokens (what you send)
- Output tokens (what the model generates)
Output tokens are typically more expensive.
Example costs per 1M tokens (these change frequently, always down):
| Model | Input Cost | Output Cost |
|---|---|---|
| Gemini 2.5 Flash Lite | $0.05 | $0.20 |
| GPT-4o Mini | $0.20 | $0.78 |
| Claude 3 Haiku | $1.04 | $5.20 |
| Claude Sonnet 4.5 | $3.90 | $19.50 |
| GPT-5 | $6.50 | $26.00 |
Remember: $10 of usage = roughly 10 million words on top models.
Model-Specific Guides
Here's a breakdown of every model available on Chipp, organized by capability.
Ultra-Fast Models (Lite Tier)
Gemini 2.5 Flash Lite
- Context: 1M tokens (≈750k words)
- Best for: Low-latency applications requiring instant responses
- Speed: Ultra-fast
- Cost: Ultra-low
- Input: $0.05/1M tokens
- Output: $0.20/1M tokens
- Ultra-fast Gemini model optimized for low-latency applications.
Gemini 2.0 Flash Lite
- Context: 1M tokens (≈750k words)
- Best for: Fast, efficient tasks with large context
- Speed: Ultra-fast
- Cost: Ultra-low
- Input: $0.05/1M tokens
- Output: $0.20/1M tokens
- Lightweight Gemini model for fast, efficient tasks.
Fast & Efficient Models
Claude 3 Haiku
- Context: 200k tokens (≈150k words)
- Best for: Near-instant responses, high-volume interactive workloads
- Speed: Very fast
- Cost: Low
- Input: $1.04/1M tokens
- Output: $5.20/1M tokens
- Known for its speed and affordability, Claude 3 Haiku is designed for near-instant responsiveness, ideal for interactive workloads.
GPT-4o Mini
- Context: 128k tokens (≈96k words)
- Best for: High-volume tasks, chat, quick responses
- Speed: Very fast
- Cost: Very low
- Input: $0.20/1M tokens
- Output: $0.78/1M tokens
- A smaller, more energy-efficient version of GPT-4o that offers high-quality responses with reduced resource usage.
GPT-5 Nano
- Context: 400k tokens (≈300k words)
- Best for: Maximum speed with minimal cost
- Speed: Ultra-fast
- Cost: Very low
- Input: $0.65/1M tokens
- Output: $2.60/1M tokens
- Ultra-lightweight GPT-5 model optimized for maximum speed and minimal cost.
GPT-4.1 Nano
- Context: 1M tokens (≈750k words)
- Best for: Fast, cost-efficient tasks with long context
- Speed: Ultra-fast
- Cost: Very low
- Input: $0.65/1M tokens
- Output: $2.60/1M tokens
- Ultra-lightweight GPT-4.1 model for fast, cost-efficient tasks.
Balanced Performance Models
Gemini 2.0 Flash
- Context: 1M tokens (≈750k words)
- Best for: Next-gen speed with enhanced capabilities
- Speed: Very fast
- Cost: Very low
- Input: $0.10/1M tokens
- Output: $0.39/1M tokens
- Next-generation Gemini Flash model with enhanced capabilities.
GPT-5 Mini
- Context: 400k tokens (≈300k words)
- Best for: Lighter reasoning tasks with reduced latency
- Speed: Very fast
- Cost: Low
- Input: $1.30/1M tokens
- Output: $5.20/1M tokens
- Compact version of GPT-5 for lighter reasoning tasks with reduced latency and cost.
GPT-4.1 Mini
- Context: 1M tokens (≈750k words)
- Best for: Strong performance at lower cost with long context
- Speed: Very fast
- Cost: Low
- Input: $1.30/1M tokens
- Output: $5.20/1M tokens
- Compact version of GPT-4.1 offering strong performance at lower cost.
Claude 3.5 Haiku Latest
- Context: 200k tokens (≈150k words)
- Best for: Always up-to-date fast responses
- Speed: Very fast
- Cost: Low
- Input: $1.04/1M tokens
- Output: $5.20/1M tokens
- Always up-to-date version of Claude 3.5 Haiku optimized for speed.
Advanced Reasoning Models
OpenAI o1
- Context: 200k tokens (≈150k words)
- Best for: Complex problem-solving and deep analysis
- Speed: Moderate (deep thinking)
- Cost: Premium
- Input: $19.50/1M tokens
- Output: $78.00/1M tokens
- Advanced reasoning model designed for complex problem-solving and deep analysis.
OpenAI o4 Mini
- Context: 128k tokens (≈96k words)
- Best for: Balanced reasoning performance with efficiency
- Speed: Moderate
- Cost: Moderate
- Input: $3.90/1M tokens
- Output: $15.60/1M tokens
- Compact reasoning model balancing performance with efficiency.
OpenAI o3-mini
- Context: 200k tokens (≈150k words)
- Best for: Complex reasoning workloads efficiently
- Speed: Moderate (deep thinking)
- Cost: Low
- Input: $1.43/1M tokens
- Output: $5.72/1M tokens
- Designed to handle complex reasoning workloads efficiently, offering faster performance and responsiveness.
Gemini 2.5 Flash
- Context: 1M tokens (≈750k words)
- Best for: Fast reasoning with massive context
- Speed: Very fast
- Cost: Very low
- Input: $0.10/1M tokens
- Output: $0.39/1M tokens
- Fast and efficient Gemini model with 1M+ context, optimized for speed.
High Performance Models
Claude Sonnet 4.5
- Context: 200k tokens (≈150k words)
- Best for: Enhanced reasoning and coding capabilities
- Speed: Fast
- Cost: Moderate
- Input: $3.90/1M tokens
- Output: $19.50/1M tokens
- Enhanced Claude Sonnet with improved reasoning and coding capabilities.
Claude Sonnet 4
- Context: 200k tokens (≈150k words)
- Best for: Balanced performance at lower cost than Opus
- Speed: Fast
- Cost: Moderate
- Input: $3.90/1M tokens
- Output: $19.50/1M tokens
- Balanced Claude model offering strong performance at lower cost than Opus.
Claude 3.7 Sonnet Latest
- Context: 200k tokens (≈150k words)
- Best for: Always up-to-date Claude performance
- Speed: Fast
- Cost: Moderate
- Input: $3.90/1M tokens
- Output: $19.50/1M tokens
- Always up-to-date version of Claude 3.7 Sonnet with the latest improvements.
OpenAI o3
- Context: 200k tokens (≈150k words)
- Best for: Next-gen complex problem-solving
- Speed: Moderate (deep thinking)
- Cost: High
- Input: $13.00/1M tokens
- Output: $52.00/1M tokens
- Next-generation reasoning model with enhanced capabilities for complex problem-solving.
Flagship Models (Best Overall)
Claude Opus 4
- Context: 200k tokens (≈150k words)
- Best for: Top-tier performance across all tasks
- Speed: Moderate
- Cost: Premium
- Input: $19.50/1M tokens
- Output: $97.50/1M tokens
- Anthropic's flagship model with top-tier performance across all tasks.
GPT-5
- Context: 400k tokens (≈300k words)
- Best for: Advanced reasoning, code quality, and accuracy
- Speed: Moderate
- Cost: High
- Input: $6.50/1M tokens
- Output: $26.00/1M tokens
- OpenAI's most advanced model with major improvements in reasoning, code quality, and accuracy.
GPT-5 Chat Latest
- Context: 400k tokens (≈300k words)
- Best for: Always up-to-date GPT-5 optimized for chat
- Speed: Moderate
- Cost: High
- Input: $6.50/1M tokens
- Output: $26.00/1M tokens
- Always up-to-date version of GPT-5 optimized for chat with the latest improvements.
Gemini 2.5 Pro
- Context: 1M tokens (≈750k words)
- Best for: Massive context window with multimodal capabilities
- Speed: Moderate
- Cost: Low (for flagship performance)
- Input: $1.63/1M tokens
- Output: $6.50/1M tokens
- Google's flagship model with massive 1M+ context window and multimodal capabilities.
Multimodal Features: Voice & Video
Chipp supports more than just text. You can add voice conversations and video generation to your agents. Chipp chooses the model for you with voice and video, so no choices to make. Just make sure you understand the pricing.
Voice (OpenAI Realtime API)
Every Chipp can speak using OpenAI's Realtime API. Real-time voice lets users have natural conversations with your agent over the phone or through voice interfaces.
Pricing:
| Feature | Cost |
|---|---|
| Audio Input | $0.06 per minute |
| Audio Output | $0.24 per minute |
| Text Input (transcription) | $5.00 per 1M tokens |
| Text Output (responses) | $20.00 per 1M tokens |
What this means:
- A 5-minute phone call with your agent costs roughly $0.30-0.50
- Costs include both the audio streaming and text processing
- Silence counts if you're streaming continuously—use voice activity detection to optimize
Practical example:
- Customer support hotline: 100 calls/month × 5 minutes average = 500 minutes
- Cost: (500 min × $0.06 input) + (100 min × $0.24 output) = ~$54/month for voice
Video Generation (Veo 3.1)
Chipp integrates Veo 3.1, Google's video generation model. Create up to 8-second videos from text prompts—ideal for social media, marketing, and prototypes.
Pricing:
| Video Type | Cost per Second |
|---|---|
| Standard Quality | $0.40/second |
| Fast Generation | $0.15/second |
Both include audio by default. You're only charged if the video generates successfully.
What this means:
- 8-second Instagram reel (standard): $3.20
- 8-second quick prototype (fast): $1.20
- 1-minute video (standard): $24.00
- 1-minute video (fast): $9.00
Practical example:
- Social media agency creating 50 short videos/month (8 seconds each, fast mode)
- Cost: 50 videos × 8 seconds × $0.15 = $60/month
When to Use Multimodal
Use voice when:
- Your users prefer talking over typing (kids, accessibility, hands-free scenarios)
- You need real-time phone support (24/7 customer service, appointment booking)
- The use case benefits from tone and emotion (therapy, coaching, education)
Use video when:
- You need visual content at scale (social media, marketing campaigns)
- Prototyping concepts for clients (real estate walkthroughs, product demos)
- Creating educational content (exercise demonstrations, how-to guides)
Cost management tips:
- Voice: Use push-to-talk or voice activity detection to avoid charging for silence
- Video: Use fast mode for prototypes, standard for final deliverables
- Combine strategically: Not every interaction needs voice or video—use text for simple tasks
Rules of Thumb
Choosing by context needs:
- Short tasks (chat, Q&A): Any model works (use fast/cheap ones like GPT-4o Mini or Gemini Flash Lite)
- Medium docs (reports, articles): 200k models (Claude Haiku, Claude Sonnet)
- Long docs (books, research): 1M models (Gemini 2.5 Pro, GPT-4.1 Nano/Mini, Gemini Flash models)
- Very long context: 400k models (GPT-5, GPT-5 Nano/Mini)
Choosing by speed:
- Ultra-fast: Gemini Flash Lite models, GPT-4o Mini, GPT-4.1/5 Nano
- Very fast: Claude Haiku, Gemini Flash, GPT-5 Mini
- Balanced: Claude Sonnet models, GPT-5
- Deep thinking: o3-mini, o3, o1, o4 Mini (slower but better reasoning)
Choosing by cost:
- Ultra-budget: Gemini Flash Lite ($0.05/$0.20), GPT-4.1 Mini ($0.20/$0.78)
- Budget: Gemini 2.0 Flash ($0.10/$0.39), GPT-4.1/5 Nano ($0.65/$2.60)
- Balanced: Claude Haiku ($1.04/$5.20), GPT-5 Mini ($1.30/$5.20), o3-mini ($1.43/$5.72)
- Performance: Claude Sonnet models ($3.90/$19.50), GPT-5 ($6.50/$26.00)
- Premium: o1 ($19.50/$78.00), Claude Opus 4 ($19.50/$97.50)
General rules:
- 1,000 tokens ≈ 750 words
- $10 usage ≈ 10 million words (top models)
- Larger context = more expensive per request
- Output tokens cost 2-5x more than input tokens
Optimizing Token Usage
Reduce token consumption:
- Clear out old messages - Long conversations eat tokens. Start fresh when switching topics.
- Trim your instructions - Every word in your system prompt counts. Be concise.
- Use smaller knowledge sources - Upload only what you need. A 500-page manual uses tokens every request.
- Choose the right model - Don't use GPT-5 for simple tasks. Use GPT-4.1 Mini or Claude Haiku.
- Limit conversation history - Chipp automatically manages this, but know that every old message uses tokens.
Best practices:
- Start with fast, cheap models (GPT-4.1 Mini, Claude Haiku)
- Upgrade to powerful models only when you need deep reasoning
- Use long-context models (Gemini Pro) only when you're actually uploading large documents
- Monitor your usage in Chipp settings
Practical Examples
Example 1: Customer Support Chatbot
Use case: Answer common questions about your product
Best model: GPT-4.1 Mini or Gemini 2.5 Flash Lite
- Why: Ultra-fast, incredibly cheap, great for simple Q&A
- Context needed: 128k-1M (enough for conversation + knowledge base)
- Cost: $0.05-0.20 per 1M input tokens
Monthly cost estimate:
- 1,000 conversations/month
- Average 500 tokens per conversation
- = 500,000 tokens/month = $0.03-0.10/month
Example 2: Long Document Analysis
Use case: Upload a 200-page research paper and ask questions
Best model: Gemini 2.5 Pro or GPT-4.1 Mini
- Why: Long context window, strong reasoning, affordable
- Context needed: 1M tokens
- Cost: $1.30-1.63 per 1M input tokens
Per-document cost estimate:
- 200-page PDF ≈ 150,000 tokens
- Analysis response ≈ 5,000 tokens
- = ~$0.20-0.25 per document (input) + ~$0.33-0.98 per response (output)
Example 3: Code Generation Assistant
Use case: Generate and review code
Best model: Claude Sonnet 4.5 or GPT-5
- Why: Strong coding capabilities with balanced performance
- Context needed: 200k-400k tokens
- Cost: $3.90-6.50 per 1M input tokens
Monthly cost estimate:
- 100 code generations/month
- Average 2,000 tokens per request
- = 200,000 tokens/month = $0.78-1.30/month (input)
Example 4: Educational Tutor
Use case: Help students learn with step-by-step explanations
Best model: o3-mini or o4 Mini
- Why: Reasoning models that show their work, efficient pricing
- Context needed: 128k-200k tokens
- Cost: $1.43-3.90 per 1M input tokens
Per-session cost:
- 50-question tutoring session
- Average 300 tokens per exchange
- = 15,000 tokens = $0.02-0.06 per session
When to Care About Tokens (and When Not To)
Don't worry about tokens if:
- You're on a paid Chipp plan with usage included
- You're using ultra-cheap models (Gemini Flash Lite, GPT-4.1 Mini, Gemini 2.0 Flash)
- You have low volume (< 1,000 conversations/month)
- Your conversations are short (< 500 tokens each)
You're probably fine. Just build.
Do pay attention to tokens if:
- You're uploading massive documents (100+ pages) every request
- You're using expensive models (Claude Opus 4, o1, o3) at high volume
- You have long conversation histories (50+ back-and-forth messages)
- You're hitting your usage limits and getting charged overages
- You're doing a lot of voice calls or video generation
Optimize by switching to cheaper models or trimming context.
Quick Decision Tree
Not sure which model to pick? Follow this:
- Do you need to upload very long documents (100+ pages)?
- Yes → Gemini 2.5 Pro (1M context) or GPT-4.1 Mini (1M context)
- No → Keep going
- Do you need deep reasoning or step-by-step problem solving?
- Yes → o3-mini (best value), o4 Mini (faster), or o1/o3 (premium)
- No → Keep going
- Do you need the absolute best quality?
- Yes → GPT-5, Claude Opus 4, or Gemini 2.5 Pro
- No → Keep going
- Do you need it ultra-fast and ultra-cheap?
- Yes → Gemini 2.5 Flash Lite, Gemini 2.0 Flash Lite, or GPT-4.1 Mini
- No → Keep going
- Do you need balanced performance?
- Yes → Claude Sonnet 4.5, Claude Haiku, or GPT-5 Mini
- No → Start with Gemini 2.0 Flash (great all-around)
Still unsure? Start with Gemini 2.0 Flash or GPT-4.1. They're ultra-fast, ultra-cheap, and handle 95% of tasks perfectly. Upgrade only if you need more power.
