Guide to AI Tokens

Understanding tokens is key to getting the most out of AI models and your budget This guide breaks down what tokens are, how they work on Chipp, and practical tips for choosing the right model.

What Are AI Tokens?

Tokens are how AI models read and process text. Think of them as word chunks.

Rule of thumb: 1,000 tokens ≈ 750 words

So a 128,000 token limit means roughly 96,000 words of conversation history.

What counts as tokens:

Your instructions (system prompt)
Knowledge sources you've uploaded
Conversation history (all previous messages)
The AI's responses
Any images or files you upload

Everything adds up.

Why Tokens Matter

Tokens determine three things:

How much context the model remembers - Can it see your entire conversation or just the last few messages?
How much you can upload - Can you paste a 100-page document or just a few paragraphs?
How much it costs - More tokens = higher cost

Context Windows Explained

The context window is the maximum tokens a model can process at once.

Think of it like the model's working memory. Everything you give it—your instructions, uploaded files, conversation history—has to fit in this window.

Example:

Model has 128,000 token context window (≈96,000 words)
Your instructions: 500 tokens
Uploaded PDF: 50,000 tokens
Conversation so far: 20,000 tokens
Remaining space: 57,500 tokens for the AI's response and future messages

When you hit the limit, older messages get dropped to make room.

Why it matters:

Larger context = model remembers more
Larger context = you can upload bigger documents
Larger context = usually more expensive

How Tokens Work on Chipp

All About Usage

Chipp pays for all AI costs on your account.

That means every time you ask a question, we pay whichever model provider you choose (OpenAI, Google, etc.).

To make things easy for you, initial usage is included with your plan.

If you exceed that usage in a given month, you are charged the overage. Chipp charges the direct cost for the model you choose with a 30% service fee.

Quick Reference: $10/mo provides roughly 10 million words on the top models, more on the smaller models.

What This Means

Pro plan: $29/month with $10 usage included
Team plan: $99/month with $30 usage included
Business plan: $299/month with $100 usage included

After you use your included amount, you only pay for what you use. Fast models cost less. Powerful models cost more. You're in control.

Plus, with our built in top-up and usage notifications, you will always know what's coming and only get charged for what you want.

Token Costs and Pricing

Models charge per million tokens. There are two costs:

Input tokens (what you send)
Output tokens (what the model generates)

Output tokens are typically more expensive.

Example costs per 1M tokens (these change frequently, always down):

Model	Input Cost	Output Cost
Gemini 2.5 Flash Lite	$0.05	$0.20
GPT-4o Mini	$0.20	$0.78
Claude 3 Haiku	$1.04	$5.20
Claude Sonnet 4.5	$3.90	$19.50
GPT-5	$6.50	$26.00

Remember: $10 of usage = roughly 10 million words on top models.

Model-Specific Guides

Here's a breakdown of every model available on Chipp, organized by capability.

Ultra-Fast Models (Lite Tier)

Gemini 2.5 Flash Lite

Context: 1M tokens (≈750k words)
Best for: Low-latency applications requiring instant responses
Speed: Ultra-fast
Cost: Ultra-low
Input: $0.05/1M tokens
Output: $0.20/1M tokens
Ultra-fast Gemini model optimized for low-latency applications.

Gemini 2.0 Flash Lite

Context: 1M tokens (≈750k words)
Best for: Fast, efficient tasks with large context
Speed: Ultra-fast
Cost: Ultra-low
Input: $0.05/1M tokens
Output: $0.20/1M tokens
Lightweight Gemini model for fast, efficient tasks.

Fast & Efficient Models

Claude 3 Haiku

Context: 200k tokens (≈150k words)
Best for: Near-instant responses, high-volume interactive workloads
Speed: Very fast
Cost: Low
Input: $1.04/1M tokens
Output: $5.20/1M tokens
Known for its speed and affordability, Claude 3 Haiku is designed for near-instant responsiveness, ideal for interactive workloads.

GPT-4o Mini

Context: 128k tokens (≈96k words)
Best for: High-volume tasks, chat, quick responses
Speed: Very fast
Cost: Very low
Input: $0.20/1M tokens
Output: $0.78/1M tokens
A smaller, more energy-efficient version of GPT-4o that offers high-quality responses with reduced resource usage.

GPT-5 Nano

Context: 400k tokens (≈300k words)
Best for: Maximum speed with minimal cost
Speed: Ultra-fast
Cost: Very low
Input: $0.65/1M tokens
Output: $2.60/1M tokens
Ultra-lightweight GPT-5 model optimized for maximum speed and minimal cost.

GPT-4.1 Nano

Context: 1M tokens (≈750k words)
Best for: Fast, cost-efficient tasks with long context
Speed: Ultra-fast
Cost: Very low
Input: $0.65/1M tokens
Output: $2.60/1M tokens
Ultra-lightweight GPT-4.1 model for fast, cost-efficient tasks.

Balanced Performance Models

Gemini 2.0 Flash

Context: 1M tokens (≈750k words)
Best for: Next-gen speed with enhanced capabilities
Speed: Very fast
Cost: Very low
Input: $0.10/1M tokens
Output: $0.39/1M tokens
Next-generation Gemini Flash model with enhanced capabilities.

GPT-5 Mini

Context: 400k tokens (≈300k words)
Best for: Lighter reasoning tasks with reduced latency
Speed: Very fast
Cost: Low
Input: $1.30/1M tokens
Output: $5.20/1M tokens
Compact version of GPT-5 for lighter reasoning tasks with reduced latency and cost.

GPT-4.1 Mini

Context: 1M tokens (≈750k words)
Best for: Strong performance at lower cost with long context
Speed: Very fast
Cost: Low
Input: $1.30/1M tokens
Output: $5.20/1M tokens
Compact version of GPT-4.1 offering strong performance at lower cost.

Claude 3.5 Haiku Latest

Context: 200k tokens (≈150k words)
Best for: Always up-to-date fast responses
Speed: Very fast
Cost: Low
Input: $1.04/1M tokens
Output: $5.20/1M tokens
Always up-to-date version of Claude 3.5 Haiku optimized for speed.

Advanced Reasoning Models

OpenAI o1

Context: 200k tokens (≈150k words)
Best for: Complex problem-solving and deep analysis
Speed: Moderate (deep thinking)
Cost: Premium
Input: $19.50/1M tokens
Output: $78.00/1M tokens
Advanced reasoning model designed for complex problem-solving and deep analysis.

OpenAI o4 Mini

Context: 128k tokens (≈96k words)
Best for: Balanced reasoning performance with efficiency
Speed: Moderate
Cost: Moderate
Input: $3.90/1M tokens
Output: $15.60/1M tokens
Compact reasoning model balancing performance with efficiency.

OpenAI o3-mini

Context: 200k tokens (≈150k words)
Best for: Complex reasoning workloads efficiently
Speed: Moderate (deep thinking)
Cost: Low
Input: $1.43/1M tokens
Output: $5.72/1M tokens
Designed to handle complex reasoning workloads efficiently, offering faster performance and responsiveness.

Gemini 2.5 Flash

Context: 1M tokens (≈750k words)
Best for: Fast reasoning with massive context
Speed: Very fast
Cost: Very low
Input: $0.10/1M tokens
Output: $0.39/1M tokens
Fast and efficient Gemini model with 1M+ context, optimized for speed.

High Performance Models

Claude Sonnet 4.5

Context: 200k tokens (≈150k words)
Best for: Enhanced reasoning and coding capabilities
Speed: Fast
Cost: Moderate
Input: $3.90/1M tokens
Output: $19.50/1M tokens
Enhanced Claude Sonnet with improved reasoning and coding capabilities.

Claude Sonnet 4

Context: 200k tokens (≈150k words)
Best for: Balanced performance at lower cost than Opus
Speed: Fast
Cost: Moderate
Input: $3.90/1M tokens
Output: $19.50/1M tokens
Balanced Claude model offering strong performance at lower cost than Opus.

Claude 3.7 Sonnet Latest

Context: 200k tokens (≈150k words)
Best for: Always up-to-date Claude performance
Speed: Fast
Cost: Moderate
Input: $3.90/1M tokens
Output: $19.50/1M tokens
Always up-to-date version of Claude 3.7 Sonnet with the latest improvements.

OpenAI o3

Context: 200k tokens (≈150k words)
Best for: Next-gen complex problem-solving
Speed: Moderate (deep thinking)
Cost: High
Input: $13.00/1M tokens
Output: $52.00/1M tokens
Next-generation reasoning model with enhanced capabilities for complex problem-solving.

Flagship Models (Best Overall)

Claude Opus 4

Context: 200k tokens (≈150k words)
Best for: Top-tier performance across all tasks
Speed: Moderate
Cost: Premium
Input: $19.50/1M tokens
Output: $97.50/1M tokens
Anthropic's flagship model with top-tier performance across all tasks.

GPT-5

Context: 400k tokens (≈300k words)
Best for: Advanced reasoning, code quality, and accuracy
Speed: Moderate
Cost: High
Input: $6.50/1M tokens
Output: $26.00/1M tokens
OpenAI's most advanced model with major improvements in reasoning, code quality, and accuracy.

GPT-5 Chat Latest

Context: 400k tokens (≈300k words)
Best for: Always up-to-date GPT-5 optimized for chat
Speed: Moderate
Cost: High
Input: $6.50/1M tokens
Output: $26.00/1M tokens
Always up-to-date version of GPT-5 optimized for chat with the latest improvements.

Gemini 2.5 Pro

Context: 1M tokens (≈750k words)
Best for: Massive context window with multimodal capabilities
Speed: Moderate
Cost: Low (for flagship performance)
Input: $1.63/1M tokens
Output: $6.50/1M tokens
Google's flagship model with massive 1M+ context window and multimodal capabilities.

Multimodal Features: Voice & Video

Chipp supports more than just text. You can add voice conversations and video generation to your agents. Chipp chooses the model for you with voice and video, so no choices to make. Just make sure you understand the pricing.

Voice (OpenAI Realtime API)

Every Chipp can speak using OpenAI's Realtime API. Real-time voice lets users have natural conversations with your agent over the phone or through voice interfaces.

Pricing:

Feature	Cost
Audio Input	$0.06 per minute
Audio Output	$0.24 per minute
Text Input (transcription)	$5.00 per 1M tokens
Text Output (responses)	$20.00 per 1M tokens

What this means:

A 5-minute phone call with your agent costs roughly $0.30-0.50
Costs include both the audio streaming and text processing
Silence counts if you're streaming continuously—use voice activity detection to optimize

Practical example:

Customer support hotline: 100 calls/month × 5 minutes average = 500 minutes
Cost: (500 min × $0.06 input) + (100 min × $0.24 output) = ~$54/month for voice

Video Generation (Veo 3.1)

Chipp integrates Veo 3.1, Google's video generation model. Create up to 8-second videos from text prompts—ideal for social media, marketing, and prototypes.

Pricing:

Video Type	Cost per Second
Standard Quality	$0.40/second
Fast Generation	$0.15/second

Both include audio by default. You're only charged if the video generates successfully.

What this means:

8-second Instagram reel (standard): $3.20
8-second quick prototype (fast): $1.20
1-minute video (standard): $24.00
1-minute video (fast): $9.00

Practical example:

Social media agency creating 50 short videos/month (8 seconds each, fast mode)
Cost: 50 videos × 8 seconds × $0.15 = $60/month

When to Use Multimodal

Use voice when:

Your users prefer talking over typing (kids, accessibility, hands-free scenarios)
You need real-time phone support (24/7 customer service, appointment booking)
The use case benefits from tone and emotion (therapy, coaching, education)

Use video when:

You need visual content at scale (social media, marketing campaigns)
Prototyping concepts for clients (real estate walkthroughs, product demos)
Creating educational content (exercise demonstrations, how-to guides)

Cost management tips:

Voice: Use push-to-talk or voice activity detection to avoid charging for silence
Video: Use fast mode for prototypes, standard for final deliverables
Combine strategically: Not every interaction needs voice or video—use text for simple tasks

Rules of Thumb

Choosing by context needs:

Short tasks (chat, Q&A): Any model works (use fast/cheap ones like GPT-4o Mini or Gemini Flash Lite)
Medium docs (reports, articles): 200k models (Claude Haiku, Claude Sonnet)
Long docs (books, research): 1M models (Gemini 2.5 Pro, GPT-4.1 Nano/Mini, Gemini Flash models)
Very long context: 400k models (GPT-5, GPT-5 Nano/Mini)

Choosing by speed:

Ultra-fast: Gemini Flash Lite models, GPT-4o Mini, GPT-4.1/5 Nano
Very fast: Claude Haiku, Gemini Flash, GPT-5 Mini
Balanced: Claude Sonnet models, GPT-5
Deep thinking: o3-mini, o3, o1, o4 Mini (slower but better reasoning)

Choosing by cost:

Ultra-budget: Gemini Flash Lite ($0.05/$0.20), GPT-4.1 Mini ($0.20/$0.78)
Budget: Gemini 2.0 Flash ($0.10/$0.39), GPT-4.1/5 Nano ($0.65/$2.60)
Balanced: Claude Haiku ($1.04/$5.20), GPT-5 Mini ($1.30/$5.20), o3-mini ($1.43/$5.72)
Performance: Claude Sonnet models ($3.90/$19.50), GPT-5 ($6.50/$26.00)
Premium: o1 ($19.50/$78.00), Claude Opus 4 ($19.50/$97.50)

General rules:

1,000 tokens ≈ 750 words
$10 usage ≈ 10 million words (top models)
Larger context = more expensive per request
Output tokens cost 2-5x more than input tokens

Optimizing Token Usage

Reduce token consumption:

Clear out old messages - Long conversations eat tokens. Start fresh when switching topics.
Trim your instructions - Every word in your system prompt counts. Be concise.
Use smaller knowledge sources - Upload only what you need. A 500-page manual uses tokens every request.
Choose the right model - Don't use GPT-5 for simple tasks. Use GPT-4.1 Mini or Claude Haiku.
Limit conversation history - Chipp automatically manages this, but know that every old message uses tokens.

Best practices:

Start with fast, cheap models (GPT-4.1 Mini, Claude Haiku)
Upgrade to powerful models only when you need deep reasoning
Use long-context models (Gemini Pro) only when you're actually uploading large documents
Monitor your usage in Chipp settings

Practical Examples

Example 1: Customer Support Chatbot

Use case: Answer common questions about your product

Best model: GPT-4.1 Mini or Gemini 2.5 Flash Lite

Why: Ultra-fast, incredibly cheap, great for simple Q&A
Context needed: 128k-1M (enough for conversation + knowledge base)
Cost: $0.05-0.20 per 1M input tokens

Monthly cost estimate:

1,000 conversations/month
Average 500 tokens per conversation
= 500,000 tokens/month = $0.03-0.10/month

Example 2: Long Document Analysis

Use case: Upload a 200-page research paper and ask questions

Best model: Gemini 2.5 Pro or GPT-4.1 Mini

Why: Long context window, strong reasoning, affordable
Context needed: 1M tokens
Cost: $1.30-1.63 per 1M input tokens

Per-document cost estimate:

200-page PDF ≈ 150,000 tokens
Analysis response ≈ 5,000 tokens
= ~$0.20-0.25 per document (input) + ~$0.33-0.98 per response (output)

Example 3: Code Generation Assistant

Use case: Generate and review code

Best model: Claude Sonnet 4.5 or GPT-5

Why: Strong coding capabilities with balanced performance
Context needed: 200k-400k tokens
Cost: $3.90-6.50 per 1M input tokens