Image Recognition for AI Agents- Chipp Docs

Your Chipp app can analyze images uploaded by users. This guide explains how image recognition works and how to get the best results.

How It Works

When a user uploads an image, Chipp processes it in one of two ways depending on your app's model:

Vision-Capable Models (Recommended)

Models with native vision capabilities see images directly, just like a human would. The image is embedded in the conversation and the model can reference it naturally.

Vision-capable models:

Provider	Models
OpenAI	GPT-5, GPT-5 Mini, GPT-5 Nano, GPT-4.1, GPT-4.1 Mini, GPT-4.1 Nano
Anthropic	Claude Opus 4.1, Claude Opus 4, Claude Sonnet 4.5, Claude Sonnet 4, Claude 3.7 Sonnet, Claude 3.5 Haiku
Google	Gemini 2.5 Pro, Gemini 2.5 Flash, Gemini 2.5 Flash Lite, Gemini 2.0 Flash, Gemini 2.0 Flash Lite

Models WITHOUT vision:

OpenAI o-series (o1, o1 Pro, o3, o3 Pro, o3-mini, o4 Mini)

Non-Vision Models (Fallback)

For models without native vision (like the o-series reasoning models), Chipp uses a separate image analysis tool powered by OpenAI's GPT-5 to describe the image, then passes that description to your app's model.

This two-step process can sometimes lose context compared to native vision models.

Getting the Best Results

1. Choose the Right Model

For apps that heavily rely on image analysis, select a vision-capable model:

Go to your app in the Chipp dashboard
Navigate to Build > Configure
Under Model, select a vision-capable model like GPT-5 or Claude Sonnet 4

2. Enable Image Recognition

Make sure the capability is turned on:

Go to Build > Actions
Find Image Recognition in the Basic Actions list
Toggle it ON

3. Image Quality Tips

For best recognition accuracy:

Resolution: Higher resolution images produce better results
Lighting: Well-lit images are easier to analyze
Focus: Ensure the subject is in focus and not blurry
Format: JPEG, PNG, and WebP are all supported

4. Prompting for Analysis

Guide users to be specific about what they want analyzed:

Good prompts:

"What math equations are shown in this image?"
"Identify all the plants in this photo"
"What text appears on this sign?"

Vague prompts:

"What is this?"
"Tell me about the image"

Comparing to ChatGPT

If you notice differences between your Chipp app and ChatGPT's native image analysis, consider:

Model selection: ChatGPT uses GPT-5 by default. Ensure your Chipp app also uses GPT-5 or a comparable vision model like Claude Sonnet 4.
System prompt: Your app's personality and instructions may influence how it interprets images. ChatGPT has different default behaviors.
Context window: ChatGPT maintains conversation context differently. For complex multi-image analysis, results may vary.

Troubleshooting

Images Not Being Analyzed

Verify Image Recognition is enabled in your app's Actions
Check that the user is uploading supported formats (JPEG, PNG, WebP, GIF)
Ensure images are from supported sources (direct uploads, not all external URLs)

Poor Accuracy

Switch to a vision-capable model (GPT-5, Claude Sonnet 4, etc.)
Ask users to provide clearer, higher-resolution images
Add specific instructions in your system prompt about how to analyze images

Slow Response Times

Image analysis typically takes 5-15 seconds depending on image complexity. For faster responses:

Use smaller image files when possible
Consider GPT-5 Nano or Claude 3.5 Haiku for speed vs. accuracy tradeoff

Model Comparison for Vision

Model	Vision Quality	Speed	Best For
GPT-5	Excellent	Medium	Complex reasoning about images
Claude Sonnet 4.5	Excellent	Fast	Detailed visual descriptions
Gemini 2.5 Pro	Excellent	Medium	Multi-image comparisons, long context
GPT-5 Mini	Very Good	Fast	Balanced performance
GPT-5 Nano	Good	Very Fast	Quick, simple analysis
Claude 3.5 Haiku	Good	Very Fast	Fast responses
Gemini 2.5 Flash	Very Good	Fast	Balanced speed and quality

Image Recognition