Skip to main content

Prerequisites

  • A voice agent already created — see Create a Voice Agent
  • API keys for the providers you want to use (OpenAI, Anthropic, Google, ElevenLabs)

LLM (Language Model)

The LLM is the brain of the agent — it reads the conversation history, understands the caller’s intent, and decides what to say next (or which tool to call).

Supported providers

ProviderAvailable models
OpenAIGPT-4o, GPT-4o mini, GPT-4 Turbo
AnthropicClaude 3.5 Sonnet, Claude 3 Haiku
GoogleGemini 1.5 Pro, Gemini 1.5 Flash

Choosing a model

ConsiderationRecommendation
Best reasoning qualityGPT-4o or Claude 3.5 Sonnet
Fastest response / lowest latencyGPT-4o mini or Gemini 1.5 Flash
Cost-sensitive, high call volumeGPT-4o mini or Claude 3 Haiku
Complex multi-tool callsGPT-4o or Claude 3.5 Sonnet
Latency matters on phone calls — callers notice pauses longer than 1–2 seconds. Prefer faster models for high-volume or latency-sensitive use cases.

Configuration

  1. Open your voice agent and go to the Configuration tab.
  2. Under LLM, select your provider.
  3. Enter your API key for that provider.
  4. Select the model.

Transcriber (Speech-to-Text)

The transcriber converts the caller’s audio into text so the LLM can process it.

Supported providers

ProviderNotes
DeepgramLow latency, high accuracy, supports many languages
OpenAI WhisperStrong multilingual support

Configuration

  1. Under Transcriber, select your provider.
  2. Enter the API key.
  3. Select the language (or leave as auto-detect).
Choose a transcriber that supports the primary language your callers will use. Accuracy drops significantly if the transcriber language doesn’t match the caller’s speech.

Voice (Text-to-Speech)

The voice engine converts the agent’s text responses into speech that the caller hears.

Supported providers

ProviderNotes
ElevenLabsHighly realistic, many voice options, supports voice cloning
OpenAI TTSFast, natural-sounding, good default option

Configuration

  1. Under Voice, select your provider.
  2. Enter the API key.
  3. Select a voice ID. ElevenLabs provides a library of pre-built voices — copy the Voice ID from your ElevenLabs dashboard.
  4. Optionally adjust speaking rate and stability (ElevenLabs only).

Choosing a voice

The right voice depends on your brand and use case. Consider:
  • Tone — warm and friendly for support, professional and clear for enterprise
  • Gender and accent — match your target audience’s expectations
  • Stability — higher stability means more consistent delivery; lower allows more expressive variation

Saving changes

After configuring the LLM, transcriber, and voice, click Save. Changes take effect immediately on the next call — ongoing calls are not affected.

Estimated cost per minute

Costs vary by provider and model. As a rough guide for a typical voice agent call:
ComponentApproximate cost
LLM (GPT-4o mini)~$0.001–0.005/min
Transcriber (Deepgram)~$0.004/min
Voice (ElevenLabs)~$0.005–0.015/min
Plivo telephony~$0.005–0.02/min
Check each provider’s current pricing page for up-to-date rates.