Video Generation
Generate short videos from text descriptions using Google Veo 3 directly inside your chatbot.
Chipp apps can generate short videos from text descriptions using Google Veo 3. Consumers describe what they want, and the AI produces a video with audio — delivered as a playable download card directly in the chat.
Video generation is available on all tiers. It uses your organization’s Stripe Token Billing balance.
How It Works
Video generation is an asynchronous process. Unlike text or image responses that appear immediately, videos take 1-4 minutes to produce. The chat UI keeps the consumer informed throughout.
Consumer Asks for a Video
The consumer describes the video they want in natural language. The AI crafts a detailed prompt (100-150 words) covering subject, action, camera movement, lighting, mood, style, environment, and color palette.
Job Created
The AI calls the generateVideo tool, which creates a background job and returns a progress indicator inline in the chat.
Progress Updates
As the video moves through the pipeline (queued, generating, uploading), real-time WebSocket events update the progress indicator in the chat. Consumers see the current phase and an estimated time remaining.
Video Ready
When the video is complete, a download card replaces the progress indicator. The consumer can play the video directly or download it.
Video Parameters
The AI controls these parameters based on the consumer’s request:
| Parameter | Options | Default | Description |
|---|---|---|---|
| Duration | 4s, 6s, 8s | 8s | Length of the generated video |
| Aspect Ratio | 16:9, 9:16 | 16:9 | Landscape or portrait orientation |
| Negative Prompt | Free text | None | Things to avoid in the output |
| Reference Image | URL | None | An optional image to guide the visual style or content |
All videos are generated at 1080p resolution with audio included.
When a consumer provides a reference image, it can be used as either an “asset” (content to include in the video) or a “style” reference (visual style to match). The AI decides based on context.
The Generation Pipeline
Behind the scenes, video generation follows a multi-phase pipeline:
- Pending — Job created in the database
- Queued — Submitted to the Google Veo 3 API via Vertex AI
- Generating — Veo is rendering the video (this is the longest phase, typically 2-4 minutes)
- Uploading — Finished video is uploaded to secure cloud storage
- Completed — Video URL is saved and delivered to the consumer
If any phase fails, the job is marked as failed and the consumer receives an error message. Failed jobs can be retried.
Example Use Cases
Video generation works well for:
- Product demos — “Create a 6-second video showing a phone rotating on a white background”
- Social media content — “Generate a portrait video of coffee being poured in slow motion with warm lighting”
- Concept visualization — “Make a video of a drone flying over a futuristic city at sunset”
- Storyboarding — “Show a character walking through a forest with dappled sunlight, camera following from behind”
- Marketing clips — “Create a video of ocean waves with text overlay saying ‘Find Your Peace‘“
Prompting Tips
The AI automatically crafts detailed prompts, but consumers get better results with specificity:
- Good: “Create a video of a golden retriever running through autumn leaves in a park, slow motion, warm afternoon sunlight, shallow depth of field”
- Less effective: “Make a video of a dog”
Including details about camera movement (tracking shot, dolly zoom, static), lighting (golden hour, neon, dramatic), and mood (cinematic, playful, serene) significantly improves output quality.
Storage and Delivery
Generated videos are stored in Google Cloud Storage under your organization’s workspace. Each video gets a unique URL that is served through the standard file download system with signed URLs. Videos persist as long as the associated chat session exists.
The video appears in the chat as a rich download card, similar to how file attachments are displayed. Consumers can play the video inline or download the MP4 file.
Billing
Video generation is billed per video through the video_generation usage meter on your Stripe Token Billing balance. Each video counts as one unit regardless of duration or aspect ratio.
Video generation is more expensive than image generation. The cost reflects the compute required to render 1080p video with audio via Google Veo 3. Monitor your usage in the billing dashboard.
Limitations
- Maximum duration is 8 seconds. Veo 3 supports short-form video generation only.
- Generation takes 1-4 minutes. This is inherent to the Veo pipeline and cannot be accelerated.
- Generation may time out after 6 minutes if the Veo API is under heavy load. The consumer will see an error and can retry.
- No video editing. Unlike image generation, consumers cannot upload a video and ask for modifications. Each generation is from scratch (though a reference image can guide the output).
- YouTube URLs are not supported as reference images. Consumers must provide direct image URLs or upload files.