Configure LLM & Voice - Swiftsell AI

Prerequisites

A voice agent already created — see Create a Voice Agent
API keys for the providers you want to use (OpenAI, Anthropic, Google, ElevenLabs)

LLM (Language Model)

The LLM is the brain of the agent — it reads the conversation history, understands the caller’s intent, and decides what to say next (or which tool to call).

Supported providers

Provider	Available models
OpenAI	GPT-4o, GPT-4o mini, GPT-4 Turbo
Anthropic	Claude 3.5 Sonnet, Claude 3 Haiku
Google	Gemini 1.5 Pro, Gemini 1.5 Flash

Choosing a model

Consideration	Recommendation
Best reasoning quality	GPT-4o or Claude 3.5 Sonnet
Fastest response / lowest latency	GPT-4o mini or Gemini 1.5 Flash
Cost-sensitive, high call volume	GPT-4o mini or Claude 3 Haiku
Complex multi-tool calls	GPT-4o or Claude 3.5 Sonnet

Latency matters on phone calls — callers notice pauses longer than 1–2 seconds. Prefer faster models for high-volume or latency-sensitive use cases.

Configuration

Open your voice agent and go to the Configuration tab.
Under LLM, select your provider.
Enter your API key for that provider.
Select the model.

Transcriber (Speech-to-Text)

The transcriber converts the caller’s audio into text so the LLM can process it.

Supported providers

Provider	Notes
Deepgram	Low latency, high accuracy, supports many languages
OpenAI Whisper	Strong multilingual support

Configuration

Under Transcriber, select your provider.
Enter the API key.
Select the language (or leave as auto-detect).

Choose a transcriber that supports the primary language your callers will use. Accuracy drops significantly if the transcriber language doesn’t match the caller’s speech.

Voice (Text-to-Speech)

The voice engine converts the agent’s text responses into speech that the caller hears.

Supported providers

Provider	Notes
ElevenLabs	Highly realistic, many voice options, supports voice cloning
OpenAI TTS	Fast, natural-sounding, good default option

Configuration

Under Voice, select your provider.
Enter the API key.
Select a voice ID. ElevenLabs provides a library of pre-built voices — copy the Voice ID from your ElevenLabs dashboard.
Optionally adjust speaking rate and stability (ElevenLabs only).

Choosing a voice

The right voice depends on your brand and use case. Consider:

Tone — warm and friendly for support, professional and clear for enterprise
Gender and accent — match your target audience’s expectations
Stability — higher stability means more consistent delivery; lower allows more expressive variation

Saving changes

After configuring the LLM, transcriber, and voice, click Save. Changes take effect immediately on the next call — ongoing calls are not affected.

Estimated cost per minute

Costs vary by provider and model. As a rough guide for a typical voice agent call:

Component	Approximate cost
LLM (GPT-4o mini)	~$0.001–0.005/min
Transcriber (Deepgram)	~$0.004/min
Voice (ElevenLabs)	~$0.005–0.015/min
Plivo telephony	~$0.005–0.02/min

Check each provider’s current pricing page for up-to-date rates.

Create a Voice Agent Agent Tools

​Prerequisites

​LLM (Language Model)

​Supported providers

​Choosing a model

​Configuration

​Transcriber (Speech-to-Text)

​Supported providers

​Configuration

​Voice (Text-to-Speech)

​Supported providers

​Configuration

​Choosing a voice

​Saving changes

​Estimated cost per minute

Prerequisites

LLM (Language Model)

Supported providers

Choosing a model

Configuration

Transcriber (Speech-to-Text)

Supported providers

Configuration

Voice (Text-to-Speech)

Supported providers

Configuration

Choosing a voice

Saving changes

Estimated cost per minute