Understanding tokens is key to getting the most out of AI models and your budget This guide breaks down what tokens are, how they work on Chipp, and practical tips for choosing the right model.


What Are AI Tokens?

Tokens are how AI models read and process text. Think of them as word chunks.

Rule of thumb: 1,000 tokens ≈ 750 words

So a 128,000 token limit means roughly 96,000 words of conversation history.

What counts as tokens:

  • Your instructions (system prompt)
  • Knowledge sources you've uploaded
  • Conversation history (all previous messages)
  • The AI's responses
  • Any images or files you upload

Everything adds up.


Why Tokens Matter

Tokens determine three things:

  1. How much context the model remembers - Can it see your entire conversation or just the last few messages?
  2. How much you can upload - Can you paste a 100-page document or just a few paragraphs?
  3. How much it costs - More tokens = higher cost

Context Windows Explained

The context window is the maximum tokens a model can process at once.

Think of it like the model's working memory. Everything you give it—your instructions, uploaded files, conversation history—has to fit in this window.

Example:

  • Model has 128,000 token context window (≈96,000 words)
  • Your instructions: 500 tokens
  • Uploaded PDF: 50,000 tokens
  • Conversation so far: 20,000 tokens
  • Remaining space: 57,500 tokens for the AI's response and future messages

When you hit the limit, older messages get dropped to make room.

Why it matters:

  • Larger context = model remembers more
  • Larger context = you can upload bigger documents
  • Larger context = usually more expensive

How Tokens Work on Chipp

All About Usage

Chipp pays for all AI costs on your account.

That means every time you ask a question, we pay whichever model provider you choose (OpenAI, Google, etc.).

To make things easy for you, initial usage is included with your plan.

If you exceed that usage in a given month, you are charged the overage. Chipp charges the direct cost for the model you choose with a 30% service fee.

Quick Reference: $10/mo provides roughly 10 million words on the top models, more on the smaller models.

What This Means

  • Pro plan: $29/month with $10 usage included
  • Team plan: $99/month with $30 usage included
  • Business plan: $299/month with $100 usage included

After you use your included amount, you only pay for what you use. Fast models cost less. Powerful models cost more. You're in control.

Plus, with our built in top-up and usage notifications, you will always know what's coming and only get charged for what you want.


Token Costs and Pricing

Models charge per million tokens. There are two costs:

  1. Input tokens (what you send)
  2. Output tokens (what the model generates)

Output tokens are typically more expensive.

Example costs per 1M tokens (these change frequently, always down):

Model Input Cost Output Cost
Gemini 2.5 Flash Lite $0.05 $0.20
GPT-4o Mini $0.20 $0.78
Claude 3 Haiku $1.04 $5.20
Claude Sonnet 4.5 $3.90 $19.50
GPT-5 $6.50 $26.00

Remember: $10 of usage = roughly 10 million words on top models.


Model-Specific Guides

Here's a breakdown of every model available on Chipp, organized by capability.

Ultra-Fast Models (Lite Tier)

Gemini 2.5 Flash Lite

  • Context: 1M tokens (≈750k words)
  • Best for: Low-latency applications requiring instant responses
  • Speed: Ultra-fast
  • Cost: Ultra-low
  • Input: $0.05/1M tokens
  • Output: $0.20/1M tokens
  • Ultra-fast Gemini model optimized for low-latency applications.

Gemini 2.0 Flash Lite

  • Context: 1M tokens (≈750k words)
  • Best for: Fast, efficient tasks with large context
  • Speed: Ultra-fast
  • Cost: Ultra-low
  • Input: $0.05/1M tokens
  • Output: $0.20/1M tokens
  • Lightweight Gemini model for fast, efficient tasks.

Fast & Efficient Models

Claude 3 Haiku

  • Context: 200k tokens (≈150k words)
  • Best for: Near-instant responses, high-volume interactive workloads
  • Speed: Very fast
  • Cost: Low
  • Input: $1.04/1M tokens
  • Output: $5.20/1M tokens
  • Known for its speed and affordability, Claude 3 Haiku is designed for near-instant responsiveness, ideal for interactive workloads.

GPT-4o Mini

  • Context: 128k tokens (≈96k words)
  • Best for: High-volume tasks, chat, quick responses
  • Speed: Very fast
  • Cost: Very low
  • Input: $0.20/1M tokens
  • Output: $0.78/1M tokens
  • A smaller, more energy-efficient version of GPT-4o that offers high-quality responses with reduced resource usage.

GPT-5 Nano

  • Context: 400k tokens (≈300k words)
  • Best for: Maximum speed with minimal cost
  • Speed: Ultra-fast
  • Cost: Very low
  • Input: $0.65/1M tokens
  • Output: $2.60/1M tokens
  • Ultra-lightweight GPT-5 model optimized for maximum speed and minimal cost.

GPT-4.1 Nano

  • Context: 1M tokens (≈750k words)
  • Best for: Fast, cost-efficient tasks with long context
  • Speed: Ultra-fast
  • Cost: Very low
  • Input: $0.65/1M tokens
  • Output: $2.60/1M tokens
  • Ultra-lightweight GPT-4.1 model for fast, cost-efficient tasks.

Balanced Performance Models

Gemini 2.0 Flash

  • Context: 1M tokens (≈750k words)
  • Best for: Next-gen speed with enhanced capabilities
  • Speed: Very fast
  • Cost: Very low
  • Input: $0.10/1M tokens
  • Output: $0.39/1M tokens
  • Next-generation Gemini Flash model with enhanced capabilities.

GPT-5 Mini

  • Context: 400k tokens (≈300k words)
  • Best for: Lighter reasoning tasks with reduced latency
  • Speed: Very fast
  • Cost: Low
  • Input: $1.30/1M tokens
  • Output: $5.20/1M tokens
  • Compact version of GPT-5 for lighter reasoning tasks with reduced latency and cost.

GPT-4.1 Mini

  • Context: 1M tokens (≈750k words)
  • Best for: Strong performance at lower cost with long context
  • Speed: Very fast
  • Cost: Low
  • Input: $1.30/1M tokens
  • Output: $5.20/1M tokens
  • Compact version of GPT-4.1 offering strong performance at lower cost.

Claude 3.5 Haiku Latest

  • Context: 200k tokens (≈150k words)
  • Best for: Always up-to-date fast responses
  • Speed: Very fast
  • Cost: Low
  • Input: $1.04/1M tokens
  • Output: $5.20/1M tokens
  • Always up-to-date version of Claude 3.5 Haiku optimized for speed.

Advanced Reasoning Models

OpenAI o1

  • Context: 200k tokens (≈150k words)
  • Best for: Complex problem-solving and deep analysis
  • Speed: Moderate (deep thinking)
  • Cost: Premium
  • Input: $19.50/1M tokens
  • Output: $78.00/1M tokens
  • Advanced reasoning model designed for complex problem-solving and deep analysis.

OpenAI o4 Mini

  • Context: 128k tokens (≈96k words)
  • Best for: Balanced reasoning performance with efficiency
  • Speed: Moderate
  • Cost: Moderate
  • Input: $3.90/1M tokens
  • Output: $15.60/1M tokens
  • Compact reasoning model balancing performance with efficiency.

OpenAI o3-mini

  • Context: 200k tokens (≈150k words)
  • Best for: Complex reasoning workloads efficiently
  • Speed: Moderate (deep thinking)
  • Cost: Low
  • Input: $1.43/1M tokens
  • Output: $5.72/1M tokens
  • Designed to handle complex reasoning workloads efficiently, offering faster performance and responsiveness.

Gemini 2.5 Flash

  • Context: 1M tokens (≈750k words)
  • Best for: Fast reasoning with massive context
  • Speed: Very fast
  • Cost: Very low
  • Input: $0.10/1M tokens
  • Output: $0.39/1M tokens
  • Fast and efficient Gemini model with 1M+ context, optimized for speed.

High Performance Models

Claude Sonnet 4.5

  • Context: 200k tokens (≈150k words)
  • Best for: Enhanced reasoning and coding capabilities
  • Speed: Fast
  • Cost: Moderate
  • Input: $3.90/1M tokens
  • Output: $19.50/1M tokens
  • Enhanced Claude Sonnet with improved reasoning and coding capabilities.

Claude Sonnet 4

  • Context: 200k tokens (≈150k words)
  • Best for: Balanced performance at lower cost than Opus
  • Speed: Fast
  • Cost: Moderate
  • Input: $3.90/1M tokens
  • Output: $19.50/1M tokens
  • Balanced Claude model offering strong performance at lower cost than Opus.

Claude 3.7 Sonnet Latest

  • Context: 200k tokens (≈150k words)
  • Best for: Always up-to-date Claude performance
  • Speed: Fast
  • Cost: Moderate
  • Input: $3.90/1M tokens
  • Output: $19.50/1M tokens
  • Always up-to-date version of Claude 3.7 Sonnet with the latest improvements.

OpenAI o3

  • Context: 200k tokens (≈150k words)
  • Best for: Next-gen complex problem-solving
  • Speed: Moderate (deep thinking)
  • Cost: High
  • Input: $13.00/1M tokens
  • Output: $52.00/1M tokens
  • Next-generation reasoning model with enhanced capabilities for complex problem-solving.

Flagship Models (Best Overall)

Claude Opus 4

  • Context: 200k tokens (≈150k words)
  • Best for: Top-tier performance across all tasks
  • Speed: Moderate
  • Cost: Premium
  • Input: $19.50/1M tokens
  • Output: $97.50/1M tokens
  • Anthropic's flagship model with top-tier performance across all tasks.

GPT-5

  • Context: 400k tokens (≈300k words)
  • Best for: Advanced reasoning, code quality, and accuracy
  • Speed: Moderate
  • Cost: High
  • Input: $6.50/1M tokens
  • Output: $26.00/1M tokens
  • OpenAI's most advanced model with major improvements in reasoning, code quality, and accuracy.

GPT-5 Chat Latest

  • Context: 400k tokens (≈300k words)
  • Best for: Always up-to-date GPT-5 optimized for chat
  • Speed: Moderate
  • Cost: High
  • Input: $6.50/1M tokens
  • Output: $26.00/1M tokens
  • Always up-to-date version of GPT-5 optimized for chat with the latest improvements.

Gemini 2.5 Pro

  • Context: 1M tokens (≈750k words)
  • Best for: Massive context window with multimodal capabilities
  • Speed: Moderate
  • Cost: Low (for flagship performance)
  • Input: $1.63/1M tokens
  • Output: $6.50/1M tokens
  • Google's flagship model with massive 1M+ context window and multimodal capabilities.

Multimodal Features: Voice & Video

Chipp supports more than just text. You can add voice conversations and video generation to your agents. Chipp chooses the model for you with voice and video, so no choices to make. Just make sure you understand the pricing.

Voice (OpenAI Realtime API)

Every Chipp can speak using OpenAI's Realtime API. Real-time voice lets users have natural conversations with your agent over the phone or through voice interfaces.

Pricing:

Feature Cost
Audio Input $0.06 per minute
Audio Output $0.24 per minute
Text Input (transcription) $5.00 per 1M tokens
Text Output (responses) $20.00 per 1M tokens

What this means:

  • A 5-minute phone call with your agent costs roughly $0.30-0.50
  • Costs include both the audio streaming and text processing
  • Silence counts if you're streaming continuously—use voice activity detection to optimize

Practical example:

  • Customer support hotline: 100 calls/month × 5 minutes average = 500 minutes
  • Cost: (500 min × $0.06 input) + (100 min × $0.24 output) = ~$54/month for voice

Video Generation (Veo 3.1)

Chipp integrates Veo 3.1, Google's video generation model. Create up to 8-second videos from text prompts—ideal for social media, marketing, and prototypes.

Pricing:

Video Type Cost per Second
Standard Quality $0.40/second
Fast Generation $0.15/second

Both include audio by default. You're only charged if the video generates successfully.

What this means:

  • 8-second Instagram reel (standard): $3.20
  • 8-second quick prototype (fast): $1.20
  • 1-minute video (standard): $24.00
  • 1-minute video (fast): $9.00

Practical example:

  • Social media agency creating 50 short videos/month (8 seconds each, fast mode)
  • Cost: 50 videos × 8 seconds × $0.15 = $60/month

When to Use Multimodal

Use voice when:

  • Your users prefer talking over typing (kids, accessibility, hands-free scenarios)
  • You need real-time phone support (24/7 customer service, appointment booking)
  • The use case benefits from tone and emotion (therapy, coaching, education)

Use video when:

  • You need visual content at scale (social media, marketing campaigns)
  • Prototyping concepts for clients (real estate walkthroughs, product demos)
  • Creating educational content (exercise demonstrations, how-to guides)

Cost management tips:

  • Voice: Use push-to-talk or voice activity detection to avoid charging for silence
  • Video: Use fast mode for prototypes, standard for final deliverables
  • Combine strategically: Not every interaction needs voice or video—use text for simple tasks

Rules of Thumb

Choosing by context needs:

  • Short tasks (chat, Q&A): Any model works (use fast/cheap ones like GPT-4o Mini or Gemini Flash Lite)
  • Medium docs (reports, articles): 200k models (Claude Haiku, Claude Sonnet)
  • Long docs (books, research): 1M models (Gemini 2.5 Pro, GPT-4.1 Nano/Mini, Gemini Flash models)
  • Very long context: 400k models (GPT-5, GPT-5 Nano/Mini)

Choosing by speed:

  • Ultra-fast: Gemini Flash Lite models, GPT-4o Mini, GPT-4.1/5 Nano
  • Very fast: Claude Haiku, Gemini Flash, GPT-5 Mini
  • Balanced: Claude Sonnet models, GPT-5
  • Deep thinking: o3-mini, o3, o1, o4 Mini (slower but better reasoning)

Choosing by cost:

  • Ultra-budget: Gemini Flash Lite ($0.05/$0.20), GPT-4.1 Mini ($0.20/$0.78)
  • Budget: Gemini 2.0 Flash ($0.10/$0.39), GPT-4.1/5 Nano ($0.65/$2.60)
  • Balanced: Claude Haiku ($1.04/$5.20), GPT-5 Mini ($1.30/$5.20), o3-mini ($1.43/$5.72)
  • Performance: Claude Sonnet models ($3.90/$19.50), GPT-5 ($6.50/$26.00)
  • Premium: o1 ($19.50/$78.00), Claude Opus 4 ($19.50/$97.50)

General rules:

  • 1,000 tokens ≈ 750 words
  • $10 usage ≈ 10 million words (top models)
  • Larger context = more expensive per request
  • Output tokens cost 2-5x more than input tokens

Optimizing Token Usage

Reduce token consumption:

  1. Clear out old messages - Long conversations eat tokens. Start fresh when switching topics.
  2. Trim your instructions - Every word in your system prompt counts. Be concise.
  3. Use smaller knowledge sources - Upload only what you need. A 500-page manual uses tokens every request.
  4. Choose the right model - Don't use GPT-5 for simple tasks. Use GPT-4.1 Mini or Claude Haiku.
  5. Limit conversation history - Chipp automatically manages this, but know that every old message uses tokens.

Best practices:

  • Start with fast, cheap models (GPT-4.1 Mini, Claude Haiku)
  • Upgrade to powerful models only when you need deep reasoning
  • Use long-context models (Gemini Pro) only when you're actually uploading large documents
  • Monitor your usage in Chipp settings

Practical Examples

Example 1: Customer Support Chatbot

Use case: Answer common questions about your product

Best model: GPT-4.1 Mini or Gemini 2.5 Flash Lite

  • Why: Ultra-fast, incredibly cheap, great for simple Q&A
  • Context needed: 128k-1M (enough for conversation + knowledge base)
  • Cost: $0.05-0.20 per 1M input tokens

Monthly cost estimate:

  • 1,000 conversations/month
  • Average 500 tokens per conversation
  • = 500,000 tokens/month = $0.03-0.10/month

Example 2: Long Document Analysis

Use case: Upload a 200-page research paper and ask questions

Best model: Gemini 2.5 Pro or GPT-4.1 Mini

  • Why: Long context window, strong reasoning, affordable
  • Context needed: 1M tokens
  • Cost: $1.30-1.63 per 1M input tokens

Per-document cost estimate:

  • 200-page PDF ≈ 150,000 tokens
  • Analysis response ≈ 5,000 tokens
  • = ~$0.20-0.25 per document (input) + ~$0.33-0.98 per response (output)

Example 3: Code Generation Assistant

Use case: Generate and review code

Best model: Claude Sonnet 4.5 or GPT-5

  • Why: Strong coding capabilities with balanced performance
  • Context needed: 200k-400k tokens
  • Cost: $3.90-6.50 per 1M input tokens

Monthly cost estimate:

  • 100 code generations/month
  • Average 2,000 tokens per request
  • = 200,000 tokens/month = $0.78-1.30/month (input)

Example 4: Educational Tutor

Use case: Help students learn with step-by-step explanations

Best model: o3-mini or o4 Mini

  • Why: Reasoning models that show their work, efficient pricing
  • Context needed: 128k-200k tokens
  • Cost: $1.43-3.90 per 1M input tokens

Per-session cost:

  • 50-question tutoring session
  • Average 300 tokens per exchange
  • = 15,000 tokens = $0.02-0.06 per session

When to Care About Tokens (and When Not To)

Don't worry about tokens if:

  • You're on a paid Chipp plan with usage included
  • You're using ultra-cheap models (Gemini Flash Lite, GPT-4.1 Mini, Gemini 2.0 Flash)
  • You have low volume (< 1,000 conversations/month)
  • Your conversations are short (< 500 tokens each)

You're probably fine. Just build.

Do pay attention to tokens if:

  • You're uploading massive documents (100+ pages) every request
  • You're using expensive models (Claude Opus 4, o1, o3) at high volume
  • You have long conversation histories (50+ back-and-forth messages)
  • You're hitting your usage limits and getting charged overages
  • You're doing a lot of voice calls or video generation

Optimize by switching to cheaper models or trimming context.


Quick Decision Tree

Not sure which model to pick? Follow this:

  1. Do you need to upload very long documents (100+ pages)?
    • Yes → Gemini 2.5 Pro (1M context) or GPT-4.1 Mini (1M context)
    • No → Keep going
  2. Do you need deep reasoning or step-by-step problem solving?
    • Yes → o3-mini (best value), o4 Mini (faster), or o1/o3 (premium)
    • No → Keep going
  3. Do you need the absolute best quality?
    • Yes → GPT-5, Claude Opus 4, or Gemini 2.5 Pro
    • No → Keep going
  4. Do you need it ultra-fast and ultra-cheap?
    • Yes → Gemini 2.5 Flash Lite, Gemini 2.0 Flash Lite, or GPT-4.1 Mini
    • No → Keep going
  5. Do you need balanced performance?
    • Yes → Claude Sonnet 4.5, Claude Haiku, or GPT-5 Mini
    • No → Start with Gemini 2.0 Flash (great all-around)

Still unsure? Start with Gemini 2.0 Flash or GPT-4.1. They're ultra-fast, ultra-cheap, and handle 95% of tasks perfectly. Upgrade only if you need more power.