VAPI Pricing Explained: How Every Component of Your Bill Is Actually Calculated
A transparent breakdown of VAPI's per-minute billing model — platform fee, LLM tokens, TTS audio, STT processing, and telephony — so you can predict costs before you scale.
Haroon Mohamed
AI Automation & Lead Generation
VAPI bills per-minute — but that's not the whole story
When you first deploy a VAPI agent, you look at the pricing page and see "$0.05/min platform fee." You think: at 1,000 minutes/month, that's $50. Easy.
Then your first invoice arrives and it's $600.
The reason: VAPI's platform fee is one of five cost components that make up every minute of call time. The other four are pass-through costs that VAPI doesn't set — but still shows up on your bill.
Here's each component, what it actually costs at current public rates, and where you have control over the number.
Component 1: VAPI platform fee
Current rate (per VAPI's public pricing, 2026): $0.05/minute
This is VAPI's own margin. It covers the infrastructure that orchestrates your call — routing audio between the LLM, STT, TTS, and telephony provider.
Control: None. This is baked in.
Optimization: Long-term, this is the one cost you can't drive down. Everything else is tunable.
Component 2: LLM inference
Current rates (OpenAI, April 2026):
- GPT-4o: $2.50 / 1M input tokens, $10.00 / 1M output tokens
- GPT-4o Mini: $0.15 / 1M input tokens, $0.60 / 1M output tokens
Current rates (Anthropic):
- Claude Sonnet 4.5: $3 / 1M input, $15 / 1M output
- Claude Haiku 4.5: $1 / 1M input, $5 / 1M output
How it adds up: Every turn of conversation sends the full conversation history plus system prompt to the LLM. A 4-minute call typically consists of 15–25 turns. If your system prompt is 1,500 tokens and each turn adds ~200 tokens of history, the cumulative input tokens for the call can reach 30,000–60,000.
Example calculation:
- 4-minute call with GPT-4o
- 50,000 input tokens + 3,000 output tokens across the conversation
- Cost: (50,000 × $2.50/1M) + (3,000 × $10/1M) = $0.125 + $0.030 = $0.155/call
- Per minute: $0.04/min
Same call with GPT-4o Mini: ~$0.003/min — a 90% reduction.
Control: Full. You choose the model in your VAPI agent config.
Component 3: Text-to-Speech (TTS)
Current rates (April 2026, approximate):
- ElevenLabs: $0.18 per 1,000 characters (flagship tier)
- Cartesia Sonic: $0.025 per 1,000 characters
- Rime AI: $0.04 per 1,000 characters
- Azure Neural: $0.016 per 1,000 characters
- PlayHT: $0.05 per 1,000 characters
How it adds up: TTS is billed per character of output audio, not per minute of call. A 4-minute call where the AI speaks for 90 seconds typically outputs 1,500–2,500 characters of text.
Example calculation (2,000 characters):
- ElevenLabs: 2,000 × $0.00018 = $0.36/call = $0.09/min
- Cartesia: 2,000 × $0.000025 = $0.05/call = $0.0125/min
- Azure: 2,000 × $0.000016 = $0.032/call = $0.008/min
That's a 10x range between the cheapest and most expensive provider for the exact same call length.
Control: Full. You select the TTS provider in VAPI config.
Component 4: Speech-to-Text (STT)
Current rates (April 2026):
- Deepgram Nova-3: $0.0043/minute (streaming)
- AssemblyAI Universal: $0.0037/minute
- OpenAI Whisper (via API): $0.006/minute
- Google Cloud STT: $0.016/minute (enhanced model)
Control: Full. VAPI supports all major providers.
Note: STT is the smallest cost component for most calls. The difference between the cheapest and most expensive is usually only $0.01–$0.02 per minute. Not worth obsessing over.
Component 5: Telephony
Current rates (April 2026):
- Twilio US phone number: $1.15/month rental
- Twilio US outbound call: $0.014/minute + per-segment for SMS
- Twilio Toll-free: $2.00/month rental + $0.019/min outbound
- VAPI native telephony (if used): $0.03/min (higher, but simpler)
Additional costs most people miss:
- A2P 10DLC brand registration (Twilio): $4 one-time + $10/month per campaign
- SHAKEN/STIR registration: included in most Twilio plans but required for volume
- CNAM display (your business name on caller ID): $5–$15/month per number
Control: Partial. You choose the provider, but the per-minute rate is set by carriers.
Putting it all together
A 4-minute call with an optimized stack in 2026:
| Component | Cost | |-----------|------| | VAPI platform | $0.20 | | LLM (GPT-4o Mini) | $0.012 | | TTS (Cartesia) | $0.05 | | STT (Deepgram) | $0.017 | | Telephony (Twilio) | $0.056 | | Total | $0.335/call | | Per minute | $0.084/min |
Same call with default/premium settings:
| Component | Cost | |-----------|------| | VAPI platform | $0.20 | | LLM (GPT-4o) | $0.16 | | TTS (ElevenLabs) | $0.36 | | STT (Deepgram) | $0.017 | | Telephony (Twilio) | $0.056 | | Total | $0.793/call | | Per minute | $0.198/min |
At 10,000 minutes/month, that's $840 vs $1,980 — a $1,140/month difference for the same functional output.
The pricing-page trap
When evaluating VAPI vs. competitors (Retell, Bland, Vocode), don't compare platform fees. Compare the full stack cost at your usage volume with your chosen providers. A platform with a $0.07/min fee but better default pricing on LLM/TTS can easily be cheaper overall than one with a $0.04/min fee and expensive defaults.
Always build a spreadsheet with your actual config before choosing a platform.
Where to verify these numbers
All rates above are pulled from each provider's public pricing page as of April 2026. Verify them yourself:
- VAPI: vapi.ai/pricing
- OpenAI: openai.com/api/pricing
- Anthropic: anthropic.com/pricing
- Cartesia: cartesia.ai/pricing
- ElevenLabs: elevenlabs.io/pricing
- Deepgram: deepgram.com/pricing
- Twilio: twilio.com/pricing
Prices change frequently. Check these pages before building a business case.
Want help modeling the full cost stack for your specific use case? Get in touch — I've done this spreadsheet exercise for 13+ client deployments.
Need This Built?
Ready to implement this for your business?
Everything in this article reflects real systems I've built and operated. Let's talk about yours.
Haroon Mohamed
Full-stack automation, AI, and lead generation specialist. 2+ years running 13+ concurrent client campaigns using GoHighLevel, multiple AI voice providers, Zapier, APIs, and custom data pipelines. Founder of HMX Zone.
Related articles
How to Train Your AI Caller for a Specific Vertical: Solar, Real Estate, HVAC
Most AI calling deployments start with a generic prompt: "qualify this lead and book an appointment." Generic prompts produce generic conversations. They miss: - Industry-specific qualifications - Co…
AI Voice for Real Estate Lead Follow-Up: What Works in the First 5 Minutes
National Association of Realtors data is clear: ~50% of buyers and sellers go with the first agent who responds. Most real estate teams call leads within 10-30 minutes. By then, the lead has already …