AI Voice5 min read4 April 2026

VAPI Pricing Explained: How Every Component of Your Bill Is Actually Calculated

A transparent breakdown of VAPI's per-minute billing model — platform fee, LLM tokens, TTS audio, STT processing, and telephony — so you can predict costs before you scale.

Haroon Mohamed

AI Automation & Lead Generation

VAPI bills per-minute — but that's not the whole story

When you first deploy a VAPI agent, you look at the pricing page and see "$0.05/min platform fee." You think: at 1,000 minutes/month, that's $50. Easy.

Then your first invoice arrives and it's $600.

The reason: VAPI's platform fee is one of five cost components that make up every minute of call time. The other four are pass-through costs that VAPI doesn't set — but still shows up on your bill.

Here's each component, what it actually costs at current public rates, and where you have control over the number.

Component 1: VAPI platform fee

Current rate (per VAPI's public pricing, 2026): $0.05/minute

This is VAPI's own margin. It covers the infrastructure that orchestrates your call — routing audio between the LLM, STT, TTS, and telephony provider.

Control: None. This is baked in.

Optimization: Long-term, this is the one cost you can't drive down. Everything else is tunable.

Component 2: LLM inference

Current rates (OpenAI, April 2026):

GPT-4o: $2.50 / 1M input tokens, $10.00 / 1M output tokens
GPT-4o Mini: $0.15 / 1M input tokens, $0.60 / 1M output tokens

Current rates (Anthropic):

Claude Sonnet 4.5: $3 / 1M input, $15 / 1M output
Claude Haiku 4.5: $1 / 1M input, $5 / 1M output

How it adds up: Every turn of conversation sends the full conversation history plus system prompt to the LLM. A 4-minute call typically consists of 15–25 turns. If your system prompt is 1,500 tokens and each turn adds ~200 tokens of history, the cumulative input tokens for the call can reach 30,000–60,000.

Example calculation:

4-minute call with GPT-4o
50,000 input tokens + 3,000 output tokens across the conversation
Cost: (50,000 × $2.50/1M) + (3,000 × $10/1M) = $0.125 + $0.030 = $0.155/call
Per minute: $0.04/min

Same call with GPT-4o Mini: ~$0.003/min — a 90% reduction.

Control: Full. You choose the model in your VAPI agent config.

Component 3: Text-to-Speech (TTS)

Current rates (April 2026, approximate):

ElevenLabs: $0.18 per 1,000 characters (flagship tier)
Cartesia Sonic: $0.025 per 1,000 characters
Rime AI: $0.04 per 1,000 characters
Azure Neural: $0.016 per 1,000 characters
PlayHT: $0.05 per 1,000 characters

How it adds up: TTS is billed per character of output audio, not per minute of call. A 4-minute call where the AI speaks for 90 seconds typically outputs 1,500–2,500 characters of text.

Example calculation (2,000 characters):

ElevenLabs: 2,000 × $0.00018 = $0.36/call = $0.09/min
Cartesia: 2,000 × $0.000025 = $0.05/call = $0.0125/min
Azure: 2,000 × $0.000016 = $0.032/call = $0.008/min

That's a 10x range between the cheapest and most expensive provider for the exact same call length.

Control: Full. You select the TTS provider in VAPI config.

Component 4: Speech-to-Text (STT)

Current rates (April 2026):

Deepgram Nova-3: $0.0043/minute (streaming)
AssemblyAI Universal: $0.0037/minute
OpenAI Whisper (via API): $0.006/minute
Google Cloud STT: $0.016/minute (enhanced model)

Control: Full. VAPI supports all major providers.

Note: STT is the smallest cost component for most calls. The difference between the cheapest and most expensive is usually only $0.01–$0.02 per minute. Not worth obsessing over.

Component 5: Telephony

Current rates (April 2026):

Twilio US phone number: $1.15/month rental
Twilio US outbound call: $0.014/minute + per-segment for SMS
Twilio Toll-free: $2.00/month rental + $0.019/min outbound
VAPI native telephony (if used): $0.03/min (higher, but simpler)

Additional costs most people miss:

A2P 10DLC brand registration (Twilio): $4 one-time + $10/month per campaign
SHAKEN/STIR registration: included in most Twilio plans but required for volume
CNAM display (your business name on caller ID): $5–$15/month per number

Control: Partial. You choose the provider, but the per-minute rate is set by carriers.

Putting it all together

A 4-minute call with an optimized stack in 2026:

| Component | Cost | |-----------|------| | VAPI platform | $0.20 | | LLM (GPT-4o Mini) | $0.012 | | TTS (Cartesia) | $0.05 | | STT (Deepgram) | $0.017 | | Telephony (Twilio) | $0.056 | | Total | $0.335/call | | Per minute | $0.084/min |

Same call with default/premium settings:

| Component | Cost | |-----------|------| | VAPI platform | $0.20 | | LLM (GPT-4o) | $0.16 | | TTS (ElevenLabs) | $0.36 | | STT (Deepgram) | $0.017 | | Telephony (Twilio) | $0.056 | | Total | $0.793/call | | Per minute | $0.198/min |

At 10,000 minutes/month, that's $840 vs $1,980 — a $1,140/month difference for the same functional output.

The pricing-page trap

When evaluating VAPI vs. competitors (Retell, Bland, Vocode), don't compare platform fees. Compare the full stack cost at your usage volume with your chosen providers. A platform with a $0.07/min fee but better default pricing on LLM/TTS can easily be cheaper overall than one with a $0.04/min fee and expensive defaults.

Always build a spreadsheet with your actual config before choosing a platform.

Where to verify these numbers

All rates above are pulled from each provider's public pricing page as of April 2026. Verify them yourself:

VAPI: vapi.ai/pricing
OpenAI: openai.com/api/pricing
Anthropic: anthropic.com/pricing
Cartesia: cartesia.ai/pricing
ElevenLabs: elevenlabs.io/pricing
Deepgram: deepgram.com/pricing
Twilio: twilio.com/pricing

Prices change frequently. Check these pages before building a business case.

Want help modeling the full cost stack for your specific use case? Get in touch — I've done this spreadsheet exercise for 13+ client deployments.

Need This Built?

Ready to implement this for your business?

Everything in this article reflects real systems I've built and operated. Let's talk about yours.

Build My System See Live Results →

Haroon Mohamed

Full-stack automation, AI, and lead generation specialist. 2+ years running 13+ concurrent client campaigns using GoHighLevel, multiple AI voice providers, Zapier, APIs, and custom data pipelines. Founder of HMX Zone.

ShareShare on X →

AI Voice9 min read

How to Train Your AI Caller for a Specific Vertical: Solar, Real Estate, HVAC

Most AI calling deployments start with a generic prompt: "qualify this lead and book an appointment." Generic prompts produce generic conversations. They miss: - Industry-specific qualifications - Co…

18 May 2026Read →

AI Voice7 min read

AI Voice for Real Estate Lead Follow-Up: What Works in the First 5 Minutes

National Association of Realtors data is clear: ~50% of buyers and sellers go with the first agent who responds. Most real estate teams call leads within 10-30 minutes. By then, the lead has already …

16 May 2026Read →