Voice Cloning
Create custom AI voices trained on your own audio samples for a branded, personalized voice agent experience.
Create a custom AI voice that sounds like you (or anyone you have permission to clone). Voice cloning uses ElevenLabs Instant Voice Cloning to synthesize voices from short audio samples — no training phase, ready in seconds.
Voice cloning requires a Studio plan or higher.
How It Works
- Record or upload a short audio sample (1-2 minutes)
- ElevenLabs analyzes vocal characteristics (pitch, timbre, prosody, accent)
- A custom voice is generated instantly
- Select it as your voice agent’s voice
Cloned voices are shared across all apps in your organization.
Creating a Custom Voice
1
Go to your app’s Build page and open the Voice card. Voice mode must be enabled.
2
Find the Custom Voices section and click Add Custom Voice.
3
Choose one method:
Browser recording (recommended):
- Click Record and speak naturally for 1-2 minutes
- Re-record if needed
File upload:
- Upload a pre-recorded audio file
- Supported formats: WAV, MP3, M4A, OGG, WebM, FLAC
- Maximum file size: 50MB
4
Give your voice a memorable name (e.g., “CEO Rachel” or “Support Persona”). Click Create Voice. The voice is available immediately.
5
In the voice selection dropdown, your custom voices appear at the top. Select it to use with your voice agent.
Recording Tips for Best Quality
The quality of your clone depends entirely on the quality of your audio sample.
Duration
- Optimal: 1-2 minutes
- Too short (<30 seconds): May lack vocal variety
- Too long (>5 minutes): Can introduce instability
- The AI captures voice characteristics best from concise, focused samples
Environment
- Record in a quiet room with soft furnishings (curtains, carpets reduce echo)
- Turn off fans, air conditioning, and notifications
- Close windows to block outside noise
- The AI replicates everything it hears — background noise becomes part of the voice
Microphone Technique
- Position the microphone about 20cm away (two fists distance)
- Speak slightly off-axis to reduce plosive sounds (hard P’s and B’s)
- Use a pop filter if available
- Avoid breathing directly into the mic
Audio Quality
- Peak levels: -6dB to -3dB (loud parts don’t clip)
- Avoid clipping/distortion at all costs — the AI can’t recover from it
- Standard sample rate (44.1kHz or 48kHz) works well
Delivery
- Maintain consistent tone and energy throughout
- Don’t switch between animated and subdued delivery
- Read with natural pacing — not robotic, not theatrical
- Use your own writing or scripts for natural rhythm
Managing Custom Voices
- Delete a voice: Remove it from your organization’s voice library at any time. This also removes it from ElevenLabs.
- Multiple voices: Create as many custom voices as your plan allows
- Cross-app usage: All apps in your organization can use any custom voice
- Audio privacy: Your recording is stored encrypted and never shared publicly
Troubleshooting
Voice doesn’t sound right?
- Re-record with better audio quality (less background noise, no clipping)
- Try a longer sample (aim for 1-2 minutes of natural speech)
- Ensure consistent tone throughout the recording
Voice not appearing in selection?
- Check that the creation completed successfully
- Refresh the voice settings page
- Verify you’re on a Studio plan or higher
Upload failing?
- Check file size (max 50MB)
- Verify the file format (WAV, MP3, M4A, OGG, WebM, FLAC)
- Try a different format or re-export the audio
For more about voice agent configuration, see the Voice Agents guide.