Guides

Video Generation

Generate short videos from text descriptions using Google Veo 3 directly inside your chatbot.

| View as Markdown
Hunter Hodnett
Hunter Hodnett CPTO at Chipp
| 1 min read
# video-generation # veo # media # tutorials

Chipp apps can generate short videos from text descriptions using Google Veo 3. Consumers describe what they want, and the AI produces a video with audio — delivered as a playable download card directly in the chat.

ℹ️

Video generation is available on all tiers. It uses your organization’s Stripe Token Billing balance.

How It Works

Video generation is an asynchronous process. Unlike text or image responses that appear immediately, videos take 1-4 minutes to produce. The chat UI keeps the consumer informed throughout.

Consumer Asks for a Video

The consumer describes the video they want in natural language. The AI crafts a detailed prompt (100-150 words) covering subject, action, camera movement, lighting, mood, style, environment, and color palette.

Job Created

The AI calls the generateVideo tool, which creates a background job and returns a progress indicator inline in the chat.

Progress Updates

As the video moves through the pipeline (queued, generating, uploading), real-time WebSocket events update the progress indicator in the chat. Consumers see the current phase and an estimated time remaining.

Video Ready

When the video is complete, a download card replaces the progress indicator. The consumer can play the video directly or download it.

Video Parameters

The AI controls these parameters based on the consumer’s request:

ParameterOptionsDefaultDescription
Duration4s, 6s, 8s8sLength of the generated video
Aspect Ratio16:9, 9:1616:9Landscape or portrait orientation
Negative PromptFree textNoneThings to avoid in the output
Reference ImageURLNoneAn optional image to guide the visual style or content

All videos are generated at 1080p resolution with audio included.

💡

When a consumer provides a reference image, it can be used as either an “asset” (content to include in the video) or a “style” reference (visual style to match). The AI decides based on context.

The Generation Pipeline

Behind the scenes, video generation follows a multi-phase pipeline:

  1. Pending — Job created in the database
  2. Queued — Submitted to the Google Veo 3 API via Vertex AI
  3. Generating — Veo is rendering the video (this is the longest phase, typically 2-4 minutes)
  4. Uploading — Finished video is uploaded to secure cloud storage
  5. Completed — Video URL is saved and delivered to the consumer

If any phase fails, the job is marked as failed and the consumer receives an error message. Failed jobs can be retried.

Example Use Cases

Video generation works well for:

  • Product demos — “Create a 6-second video showing a phone rotating on a white background”
  • Social media content — “Generate a portrait video of coffee being poured in slow motion with warm lighting”
  • Concept visualization — “Make a video of a drone flying over a futuristic city at sunset”
  • Storyboarding — “Show a character walking through a forest with dappled sunlight, camera following from behind”
  • Marketing clips — “Create a video of ocean waves with text overlay saying ‘Find Your Peace‘“

Prompting Tips

The AI automatically crafts detailed prompts, but consumers get better results with specificity:

  • Good: “Create a video of a golden retriever running through autumn leaves in a park, slow motion, warm afternoon sunlight, shallow depth of field”
  • Less effective: “Make a video of a dog”

Including details about camera movement (tracking shot, dolly zoom, static), lighting (golden hour, neon, dramatic), and mood (cinematic, playful, serene) significantly improves output quality.

Storage and Delivery

Generated videos are stored in Google Cloud Storage under your organization’s workspace. Each video gets a unique URL that is served through the standard file download system with signed URLs. Videos persist as long as the associated chat session exists.

The video appears in the chat as a rich download card, similar to how file attachments are displayed. Consumers can play the video inline or download the MP4 file.

Billing

Video generation is billed per video through the video_generation usage meter on your Stripe Token Billing balance. Each video counts as one unit regardless of duration or aspect ratio.

⚠️

Video generation is more expensive than image generation. The cost reflects the compute required to render 1080p video with audio via Google Veo 3. Monitor your usage in the billing dashboard.

Limitations

  • Maximum duration is 8 seconds. Veo 3 supports short-form video generation only.
  • Generation takes 1-4 minutes. This is inherent to the Veo pipeline and cannot be accelerated.
  • Generation may time out after 6 minutes if the Veo API is under heavy load. The consumer will see an error and can retry.
  • No video editing. Unlike image generation, consumers cannot upload a video and ask for modifications. Each generation is from scratch (though a reference image can guide the output).
  • YouTube URLs are not supported as reference images. Consumers must provide direct image URLs or upload files.