Prerequisites
- A voice agent already created — see Create a Voice Agent
- API keys for the providers you want to use (OpenAI, Anthropic, Google, ElevenLabs)
LLM (Language Model)
The LLM is the brain of the agent — it reads the conversation history, understands the caller’s intent, and decides what to say next (or which tool to call).
Supported providers
| Provider | Available models |
|---|
| OpenAI | GPT-4o, GPT-4o mini, GPT-4 Turbo |
| Anthropic | Claude 3.5 Sonnet, Claude 3 Haiku |
| Google | Gemini 1.5 Pro, Gemini 1.5 Flash |
Choosing a model
| Consideration | Recommendation |
|---|
| Best reasoning quality | GPT-4o or Claude 3.5 Sonnet |
| Fastest response / lowest latency | GPT-4o mini or Gemini 1.5 Flash |
| Cost-sensitive, high call volume | GPT-4o mini or Claude 3 Haiku |
| Complex multi-tool calls | GPT-4o or Claude 3.5 Sonnet |
Latency matters on phone calls — callers notice pauses longer than 1–2 seconds. Prefer faster models for high-volume or latency-sensitive use cases.
Configuration
- Open your voice agent and go to the Configuration tab.
- Under LLM, select your provider.
- Enter your API key for that provider.
- Select the model.
Transcriber (Speech-to-Text)
The transcriber converts the caller’s audio into text so the LLM can process it.
Supported providers
| Provider | Notes |
|---|
| Deepgram | Low latency, high accuracy, supports many languages |
| OpenAI Whisper | Strong multilingual support |
Configuration
- Under Transcriber, select your provider.
- Enter the API key.
- Select the language (or leave as auto-detect).
Choose a transcriber that supports the primary language your callers will use. Accuracy drops significantly if the transcriber language doesn’t match the caller’s speech.
Voice (Text-to-Speech)
The voice engine converts the agent’s text responses into speech that the caller hears.
Supported providers
| Provider | Notes |
|---|
| ElevenLabs | Highly realistic, many voice options, supports voice cloning |
| OpenAI TTS | Fast, natural-sounding, good default option |
Configuration
- Under Voice, select your provider.
- Enter the API key.
- Select a voice ID. ElevenLabs provides a library of pre-built voices — copy the Voice ID from your ElevenLabs dashboard.
- Optionally adjust speaking rate and stability (ElevenLabs only).
Choosing a voice
The right voice depends on your brand and use case. Consider:
- Tone — warm and friendly for support, professional and clear for enterprise
- Gender and accent — match your target audience’s expectations
- Stability — higher stability means more consistent delivery; lower allows more expressive variation
Saving changes
After configuring the LLM, transcriber, and voice, click Save. Changes take effect immediately on the next call — ongoing calls are not affected.
Estimated cost per minute
Costs vary by provider and model. As a rough guide for a typical voice agent call:
| Component | Approximate cost |
|---|
| LLM (GPT-4o mini) | ~$0.001–0.005/min |
| Transcriber (Deepgram) | ~$0.004/min |
| Voice (ElevenLabs) | ~$0.005–0.015/min |
| Plivo telephony | ~$0.005–0.02/min |
Check each provider’s current pricing page for up-to-date rates.