GitHub Docs

Overview

Not every task needs the most expensive model. A simple variable rename does not require Claude Opus when Haiku can handle it in milliseconds for a fraction of the cost. VibeCody’s cost-optimized routing engine classifies each task by complexity, selects the cheapest model that can handle it reliably, and tracks spending against configurable daily and monthly budgets. This demo shows how to inspect routing decisions, set budgets, and compare model costs across providers.

Time to complete: ~8 minutes

Prerequisites

  • VibeCLI 0.5.1 installed and on your PATH
  • Two or more AI providers configured (to see routing in action)
  • For VibeUI: the desktop app running with the CostRouterPanel visible

How Cost Routing Works

The router classifies tasks into four complexity tiers and maps each to the cheapest viable model:

Tier Examples Default Model Cost/1K tokens
Trivial Rename variable, fix typo Claude Haiku / GPT-4o-mini $0.00025
Simple Write a unit test, explain code Claude Sonnet / GPT-4o $0.003
Medium Refactor module, debug race condition Claude Sonnet / GPT-4o $0.003
Complex Architect system, multi-file refactor Claude Opus / GPT-4.5 $0.015

Step-by-Step Walkthrough

Step 1: View current routing configuration

Start VibeCLI and inspect the active routing rules:

vibecli
> /route cost
Cost Router Configuration
═════════════════════════

Active strategy: cheapest-viable-model
Budget period:   daily
Daily budget:    $5.00 (default)
Spent today:     $0.00
Remaining:       $5.00

Tier Routing Table:
  Tier     │ Provider  │ Model                │ Input $/1K │ Output $/1K
  ─────────┼───────────┼──────────────────────┼────────────┼────────────
  Trivial  │ claude    │ claude-haiku-3        │ $0.00025   │ $0.00125
  Simple   │ claude    │ claude-sonnet-4       │ $0.00300   │ $0.01500
  Medium   │ claude    │ claude-sonnet-4       │ $0.00300   │ $0.01500
  Complex  │ claude    │ claude-opus-4         │ $0.01500   │ $0.07500

Fallback chain: claude → openai → ollama (free)
Budget enforcement: soft (warn at 80%, block at 100%)

Step 2: Set a daily budget

Configure a daily spending limit:

> /route budget set 10.00
Daily budget updated: $5.00 → $10.00
Monthly projection at current usage: ~$0.00
Budget enforcement: soft (warn at 80% = $8.00, block at 100% = $10.00)

Set a strict budget that blocks requests when exceeded:

> /route budget enforce strict
Budget enforcement updated: soft → strict
When daily budget ($10.00) is exhausted, all requests will be blocked
until the next day (resets at midnight UTC).
Tip: Add [cost_router] fallback_to_ollama = true in config.toml
     to fall back to Ollama (free) instead of blocking.

Step 3: Compare model costs across providers

View a side-by-side cost comparison for all configured providers:

> /route model compare
Model Cost Comparison (all configured providers)
═════════════════════════════════════════════════

Provider   │ Model                  │ Input $/1K │ Output $/1K │ Avg task cost │ Quality
───────────┼────────────────────────┼────────────┼─────────────┼───────────────┼─────────
claude     │ claude-haiku-3         │ $0.00025   │ $0.00125    │ $0.002        │ ★★★☆☆
claude     │ claude-sonnet-4        │ $0.00300   │ $0.01500    │ $0.024        │ ★★★★☆
claude     │ claude-opus-4          │ $0.01500   │ $0.07500    │ $0.112        │ ★★★★★
openai     │ gpt-4o-mini            │ $0.00015   │ $0.00060    │ $0.001        │ ★★★☆☆
openai     │ gpt-4o                 │ $0.00250   │ $0.01000    │ $0.016        │ ★★★★☆
ollama     │ llama3                 │ free       │ free        │ free          │ ★★★☆☆
ollama     │ codellama              │ free       │ free        │ free          │ ★★★☆☆
groq       │ llama-3.3-70b          │ $0.00059   │ $0.00079    │ $0.002        │ ★★★☆☆

Cheapest cloud model:  gpt-4o-mini ($0.001/task avg)
Cheapest overall:      ollama/llama3 (free, local)
Best quality/cost:     claude-sonnet-4 ($0.024/task, ★★★★☆)

Step 4: Watch routing decisions in real time

Send a few tasks and observe which model the router selects:

> Rename the variable `x` to `count` in main.rs
[CostRouter] Task classified: Trivial
[CostRouter] Selected: claude/claude-haiku-3 ($0.00025/1K input)
[CostRouter] Reason: Simple rename, no semantic analysis needed

✓ Renamed `x` → `count` in src/main.rs (1 occurrence)

Cost: $0.001 | Budget remaining: $9.999
> Refactor the auth module to use async/await throughout
[CostRouter] Task classified: Medium
[CostRouter] Selected: claude/claude-sonnet-4 ($0.003/1K input)
[CostRouter] Reason: Multi-function refactor requiring semantic understanding

✓ Refactored 4 functions in src/auth.rs to async/await
  - validate_user() → async
  - create_session() → async
  - refresh_token() → async
  - revoke_session() → async

Cost: $0.031 | Budget remaining: $9.968
> Design a new microservice architecture for the payment system
[CostRouter] Task classified: Complex
[CostRouter] Selected: claude/claude-opus-4 ($0.015/1K input)
[CostRouter] Reason: Architecture design requiring broad reasoning

✓ Generated architecture document: docs/payment-service-architecture.md
  - 4 services: Gateway, Processor, Ledger, Notifications
  - Event-driven with Kafka
  - PostgreSQL per-service, Redis for caching

Cost: $0.142 | Budget remaining: $9.826

Step 5: View spending summary

> /route spend
Spending Summary — 2026-03-29
═════════════════════════════

Total spent today:  $0.174 / $10.00 (1.7%)
Requests today:     3

By tier:
  Trivial:  1 request  │ $0.001  │ 0.6%
  Simple:   0 requests │ $0.000  │ 0.0%
  Medium:   1 request  │ $0.031  │ 17.8%
  Complex:  1 request  │ $0.142  │ 81.6%

By provider:
  claude (haiku):   $0.001
  claude (sonnet):  $0.031
  claude (opus):    $0.142

Projected daily spend at current rate: $0.174
Projected monthly spend: ~$5.22

Step 6: Override routing for a specific task

Force a specific model when the router’s choice is not what you want:

> /route override opus
> Explain what this regex does: ^(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+)@[a-z0-9-]+\.{1}[a-z]{2,}$
[CostRouter] Override active: forcing claude/claude-opus-4
[CostRouter] Original classification: Simple (would have used Sonnet)

This regex validates email addresses...
[detailed explanation]

Cost: $0.089 | Budget remaining: $9.737
Note: Override saved $0.065 less than default routing would have.

Clear the override:

> /route override clear
Override cleared. Resuming automatic cheapest-viable-model routing.

Step 7: Use cost routing in VibeUI

In the VibeUI desktop app, open the CostRouterPanel from the AI sidebar. The panel provides:

  • Dashboard – Real-time spending chart with budget threshold lines
  • Routing Log – Every request with its tier classification, model selected, and cost
  • Budget Settings – Configure daily/monthly limits and enforcement mode
  • Model Comparison – Interactive table with sorting by cost, quality, and speed

Demo Recording

{
  "meta": {
    "title": "Cost-Optimized Agent Routing",
    "description": "Route tasks to the cheapest viable model and manage spending budgets.",
    "duration_seconds": 150,
    "version": "0.5.1"
  },
  "steps": [
    {
      "id": 1,
      "action": "repl",
      "commands": [
        { "input": "/route cost", "delay_ms": 2000 },
        { "input": "/route budget set 10.00", "delay_ms": 1500 },
        { "input": "/route model compare", "delay_ms": 3000 },
        { "input": "Rename the variable `x` to `count` in main.rs", "delay_ms": 5000 },
        { "input": "Refactor the auth module to use async/await", "delay_ms": 8000 },
        { "input": "/route spend", "delay_ms": 2000 },
        { "input": "/quit", "delay_ms": 500 }
      ],
      "description": "Configure routing, send tasks at different tiers, review spending"
    }
  ]
}

What’s Next