Overview
VibeCody supports 23 AI providers out of the box, from cloud APIs like Claude and OpenAI to fully local models via Ollama. This demo shows you how to switch between providers, configure BYOK (Bring Your Own Key), set up failover chains, compare costs, and leverage provider-specific features like vision and tool use.
Time to complete: ~10 minutes
Prerequisites
- VibeCLI installed and configured (see Demo 1: First Run)
- API keys for at least two providers (to demonstrate switching)
- Ollama installed locally for offline demos (optional)
Supported Providers
| # | Provider | Key Env Var | Local/Cloud | Notable Features |
|---|---|---|---|---|
| 1 | Ollama | (none) | Local | Fully offline, 1000+ models |
| 2 | Claude | ANTHROPIC_API_KEY |
Cloud | Tool use, 1M context (Opus), vision |
| 3 | OpenAI | OPENAI_API_KEY |
Cloud | GPT-4o, vision, function calling |
| 4 | Gemini | GEMINI_API_KEY |
Cloud | 2M context, multimodal |
| 5 | Grok | GROK_API_KEY |
Cloud | Real-time knowledge |
| 6 | Groq | GROQ_API_KEY |
Cloud | Ultra-fast LPU inference |
| 7 | OpenRouter | OPENROUTER_API_KEY |
Cloud | 300+ models, unified API |
| 8 | Azure OpenAI | AZURE_OPENAI_API_KEY |
Cloud | Enterprise compliance |
| 9 | AWS Bedrock | AWS_ACCESS_KEY_ID |
Cloud | AWS-native, IAM auth |
| 10 | GitHub Copilot | GITHUB_TOKEN |
Cloud | Uses existing Copilot subscription |
| 11 | Mistral | MISTRAL_API_KEY |
Cloud | Codestral, code-specialized |
| 12 | Cerebras | CEREBRAS_API_KEY |
Cloud | Wafer-scale inference |
| 13 | DeepSeek | DEEPSEEK_API_KEY |
Cloud | Budget-friendly coding (V3/R1) |
| 14 | Zhipu | ZHIPU_API_KEY |
Cloud | GLM-4 series |
| 15 | Vercel AI | VERCEL_AI_API_KEY |
Cloud | Gateway proxy |
| 16 | MiniMax | MINIMAX_API_KEY |
Cloud | MiniMax-Text-01 |
| 17 | Perplexity | PERPLEXITY_API_KEY |
Cloud | Search-augmented Sonar models |
| 18 | Together AI | TOGETHER_API_KEY |
Inference | Open model hosting (Llama, Qwen) |
| 19 | Fireworks AI | FIREWORKS_API_KEY |
Inference | Fast open model inference |
| 20 | SambaNova | SAMBANOVA_API_KEY |
Inference | Hardware-accelerated inference |
| 21 | LocalEdit | (none) | Local | Local FIM code completion |
| 22 | Failover | (configured) | Mixed | Auto-failover chain |
Step-by-Step Walkthrough
Step 1: Check your current provider
vibecli --provider claude "What provider are you?"
Or in the REPL (just run vibecli with no arguments):
vibecli
> What provider are you?
Step 2: Switch providers on the fly
From the command line:
# Use OpenAI
vibecli --provider openai --model gpt-4o "Explain monads"
# Use Ollama locally
vibecli --provider ollama --model llama3 "Explain monads"
# Use Gemini
vibecli --provider gemini --model gemini-2.5-flash "Explain monads"
# Use Groq for ultra-fast responses
vibecli --provider groq --model llama-3.3-70b-versatile "Explain monads"
From the REPL:
vibecli
> /model gpt-4o
Switched to model: gpt-4o
> /model codellama
Switched to model: codellama
> /model claude-sonnet-4-6
Switched to model: claude-sonnet-4-6
Step 3: Streaming responses
All providers support streaming by default. Tokens appear as they are generated.
vibecli --provider claude "Write a haiku about Rust programming"
Step 4: Provider-specific features
Vision (Claude, OpenAI, Gemini):
# Analyze an image
vibecli --provider claude "What's in this image?" --image ./screenshot.png
# In the REPL
> [./diagram.png] What does this architecture diagram show?
Tool use (Claude, OpenAI):
Tool use is automatic in agent mode. The provider’s native function-calling protocol is used when available:
vibecli --agent "Read the file src/main.rs and add error handling" --provider claude
Large context (Gemini):
# Gemini supports up to 2M tokens of context
vibecli --provider gemini --model gemini-2.5-pro \
"Summarize this codebase" --add-dir ./src/
Step 5: OpenRouter for 300+ models
OpenRouter provides access to hundreds of models through a single API key.
export OPENROUTER_API_KEY="sk-or-..."
# Use any model available on OpenRouter
vibecli --provider openrouter --model "anthropic/claude-sonnet-4-6" "Hello"
vibecli --provider openrouter --model "google/gemini-2.5-flash" "Hello"
vibecli --provider openrouter --model "meta-llama/llama-3.3-70b" "Hello"
vibecli --provider openrouter --model "deepseek/deepseek-chat" "Hello"
Step 6: BYOK (Bring Your Own Key) setup
Configure multiple API keys in your config file for team or multi-account setups:
# ~/.vibecli/config.toml
[claude]
enabled = true
api_key = "sk-ant-YOUR-KEY"
model = "claude-sonnet-4-6"
[openai]
enabled = true
api_key = "sk-YOUR-KEY"
model = "gpt-4o"
api_url = "https://api.openai.com/v1" # Customizable endpoint
[azure_openai]
enabled = true
api_key = "YOUR-AZURE-KEY"
api_url = "https://YOUR-RESOURCE.openai.azure.com"
deployment = "gpt-4o"
api_version = "2024-02-01"
[ollama]
enabled = true
api_url = "http://localhost:11434" # Default Ollama endpoint
model = "llama3"
[openrouter]
enabled = true
api_key = "sk-or-YOUR-KEY"
model = "anthropic/claude-sonnet-4-6"
Each provider supports a custom api_url for proxied or self-hosted endpoints.
Step 7: Failover provider chain
The Failover provider automatically tries the next provider in the chain when one fails (rate limit, outage, timeout):
# ~/.vibecli/config.toml
[failover]
chain = ["claude", "openai", "gemini", "ollama"]
max_retries = 2
retry_delay_ms = 1000
vibecli --provider failover "This message will be sent to the first available provider"
If Claude returns a rate limit error, VibeCody automatically retries with OpenAI, then Gemini, then falls back to local Ollama.
Step 8: Cost tracking per provider
VibeCody tracks token usage and estimated costs for every interaction.
vibecli
> /cost
Session cost summary:
claude: $0.0342 (12,400 tokens)
openai: $0.0128 (5,200 tokens)
ollama: $0.0000 (8,100 tokens) [local]
Total: $0.0470
See Demo 6: Cost Observatory for the full cost dashboard.
Step 9: Provider comparison
Send the same prompt to multiple providers and compare:
# Quick comparison from CLI
vibecli --provider claude "Write FizzBuzz in Rust" > claude_response.txt
vibecli --provider openai "Write FizzBuzz in Rust" > openai_response.txt
vibecli --provider gemini "Write FizzBuzz in Rust" > gemini_response.txt
# Or use the Arena for side-by-side comparison (see Demo 5)
Real-World Provider Workflows
Workflow 1: Privacy-First Development (Ollama)
Everything stays on your machine — no API keys, no network calls, no data sharing:
# Pull a coding model once
ollama pull qwen3-coder
# Daily coding workflow — zero cost, full privacy
vibecli --provider ollama --model qwen3-coder \
--agent "Add input validation to the /api/register endpoint"
# Code review without sending code to the cloud
git diff HEAD~1 | vibecli --provider ollama "Review this diff for bugs"
Workflow 2: Multi-Provider Cost Optimization
Use cheap/fast providers for simple tasks, premium providers for complex ones:
# Quick question → Groq (fast, free tier)
vibecli --provider groq "What does #[derive(Clone)] do in Rust?"
# Code generation → DeepSeek (budget-friendly, strong at coding)
vibecli --provider deepseek "Write a rate limiter middleware for Axum"
# Complex debugging → Claude Opus (highest reasoning quality)
vibecli --provider claude --model claude-opus-4-6 \
"There's a race condition in src/worker.rs. The worker sometimes processes \
the same job twice when under load. Find and fix it."
Workflow 3: Research with Web Grounding (Perplexity)
Get answers backed by real-time web search:
# Library decisions grounded in current benchmarks
vibecli --provider perplexity "Compare serde vs simd-json performance for large payloads in 2026"
# Debugging with current docs
vibecli --provider perplexity "How to fix 'lifetime may not live long enough' in async Rust with tokio 1.40?"
# Security advisories
vibecli --provider perplexity "Are there any recent CVEs affecting jsonwebtoken crate?"
Workflow 4: Enterprise Compliance (Azure / Bedrock)
Route all AI traffic through your corporate cloud account:
# Azure — data stays in your Azure tenant
vibecli --provider azure --agent "Migrate the auth module from JWT to OIDC"
# Bedrock — uses IAM roles, no API keys in code
vibecli --provider bedrock --agent "Add CloudWatch metrics to all Lambda handlers"
# GitHub Copilot — uses existing subscription
vibecli --provider copilot "Complete the integration test for the payment service"
Workflow 5: Exploring New Models (OpenRouter)
Try any of 300+ models without separate accounts:
# Compare a coding task across model families
vibecli --provider openrouter --model "anthropic/claude-sonnet-4-6" "Write binary search" > /tmp/claude.txt
vibecli --provider openrouter --model "google/gemini-2.5-flash" "Write binary search" > /tmp/gemini.txt
vibecli --provider openrouter --model "meta-llama/llama-3.3-70b" "Write binary search" > /tmp/llama.txt
vibecli --provider openrouter --model "deepseek/deepseek-chat" "Write binary search" > /tmp/deepseek.txt
# Side-by-side review
diff /tmp/claude.txt /tmp/gemini.txt
Workflow 6: Resilient CI Pipeline (Failover)
# ~/.vibecli/config.toml
[failover]
chain = ["claude", "openai", "gemini", "ollama"]
# CI job that never fails due to a single provider outage
vibecli --provider failover --full-auto \
--exec "Review the diff in this PR for bugs and security issues" < pr.diff
VibeUI Provider Switching
In VibeUI, open the AI panel (Cmd+J) and use the provider dropdown in the top toolbar to switch providers. The Keys panel (Cmd+J then “Keys” tab) lets you manage API keys with a graphical interface.
Demo Recording
{
"meta": {
"title": "Multi-Provider AI Chat",
"description": "Switch between 23 AI providers, set up BYOK, configure failover chains, and compare provider costs.",
"duration_seconds": 240,
"version": "1.0.0"
},
"steps": [
{
"id": 1,
"action": "shell",
"command": "vibecli --provider claude \"What is 2 + 2?\"",
"description": "Chat with Claude",
"delay_ms": 4000,
"typing_speed_ms": 40
},
{
"id": 2,
"action": "shell",
"command": "vibecli --provider openai --model gpt-4o \"What is 2 + 2?\"",
"description": "Chat with OpenAI GPT-4o",
"delay_ms": 4000,
"typing_speed_ms": 40
},
{
"id": 3,
"action": "shell",
"command": "vibecli --provider ollama --model llama3 \"What is 2 + 2?\"",
"description": "Chat with local Ollama",
"delay_ms": 4000,
"typing_speed_ms": 40
},
{
"id": 4,
"action": "shell",
"command": "vibecli --provider groq --model llama-3.3-70b-versatile \"What is 2 + 2?\"",
"description": "Chat with Groq (ultra-fast)",
"delay_ms": 3000,
"typing_speed_ms": 40
},
{
"id": 5,
"action": "repl",
"commands": [
{ "input": "/model claude-sonnet-4-6", "delay_ms": 1500 },
{ "input": "Write a one-liner Python function to reverse a string", "delay_ms": 5000 },
{ "input": "/model gpt-4o", "delay_ms": 1500 },
{ "input": "Write a one-liner Python function to reverse a string", "delay_ms": 5000 },
{ "input": "/model gemini-2.5-flash", "delay_ms": 1500 },
{ "input": "Write a one-liner Python function to reverse a string", "delay_ms": 5000 },
{ "input": "/cost", "delay_ms": 2000 },
{ "input": "/quit", "delay_ms": 500 }
],
"description": "Switch providers in REPL and compare responses, then check costs"
},
{
"id": 6,
"action": "shell",
"command": "vibecli --provider failover \"What's the weather in Tokyo?\"",
"description": "Demonstrate failover provider chain",
"delay_ms": 5000
},
{
"id": 7,
"action": "shell",
"command": "vibecli --provider openrouter --model \"meta-llama/llama-3.3-70b\" \"Hello from OpenRouter\"",
"description": "Access 300+ models via OpenRouter",
"delay_ms": 5000,
"typing_speed_ms": 40
},
{
"id": 8,
"action": "write_file",
"path": "~/.vibecli/config.toml",
"content": "[failover]\nchain = [\"claude\", \"openai\", \"ollama\"]\nmax_retries = 2\n\n[claude]\nenabled = true\napi_key = \"sk-ant-demo\"\n\n[openai]\nenabled = true\napi_key = \"sk-demo\"\n\n[ollama]\nenabled = true\nmodel = \"llama3\"\n",
"description": "Write failover provider configuration",
"delay_ms": 1000
}
]
}
What’s Next
- Demo 4: Agent Loop – Autonomous code editing with tool execution
- Demo 5: Model Arena – Compare models in a structured evaluation
- Demo 6: Cost Observatory – Deep dive into token costs and budgets