Build, fine-tune, quantize, and deploy a custom AI model in 7 steps using VibeCody’s Model Wizard.
Prerequisites: VibeCody installed, a training dataset or codebase, and either a free Colab account or a local GPU.
Quick Start (5 minutes)
The fastest path from zero to a running custom model:
# 1. Extract training data from your codebase
vibecli
> /train dataset from-codebase --format chatml --output data.jsonl
# 2. Fine-tune with Unsloth on free Colab GPU
# (Copy the generated script from /wizard into a Colab notebook)
> /wizard
# 3. Quantize the result
> /inference quantize --method gguf-q4km --model ./output
# 4. Deploy locally
> /inference deploy --backend ollama --model ./model-q4.gguf
# 5. Use your model in VibeCody
> /model ollama my-model
> /chat Hello from my custom model!
The 7 Steps in Detail
Step 1: Choose a Base Model
Pick a foundation model to fine-tune. Smaller models train faster and cheaper:
| Model | Size | Free Colab? | Best For |
|---|---|---|---|
| Llama 3.2 3B | 3B | Yes (T4) | IoT, edge, fast inference |
| Llama 3.1 8B | 8B | Yes (T4 with QLoRA) | General purpose, coding |
| Mistral 7B | 7B | Yes (T4) | Instruction following |
| Phi 3.5 Mini | 3.8B | Yes (T4) | Small, fast, surprisingly capable |
| Qwen 2.5 Coder 7B | 7B | Yes (T4) | Code generation |
| DeepSeek R1 Distill 7B | 7B | Yes (T4) | Reasoning |
| Mixtral 8x7B (MoE) | 47B | No (needs A100) | Mixture of Experts, high quality |
Recommendation: Start with Llama 3.1 8B — best quality-to-cost ratio, runs on free Colab.
Step 2: Prepare Your Dataset
Four ways to create training data:
From your codebase (extracts function docs + implementations):
> /train dataset from-codebase --format chatml --output data.jsonl
From git history (commit messages + diffs):
> /train dataset from-git --max-commits 5000 --format chatml --output data.jsonl
From documents (PDF, Markdown, HTML — uses MinerU for complex PDFs):
# For simple docs:
> /ingest ./docs --output data.jsonl
# For scientific PDFs with formulas/tables:
pip install magic-pdf[full]
magic-pdf -p ./papers/ -o ./parsed/ -m auto
> /ingest ./parsed/ --output data.jsonl
Existing dataset (use a pre-prepared JSONL file):
> /train dataset validate --file my-data.jsonl
Step 3: Configure Fine-Tuning
Six open-source libraries, each with different strengths:
| Library | Best For | GPU Needed |
|---|---|---|
| Unsloth | Single GPU, Colab, fastest setup | 1x T4 (free) |
| Axolotl | Reproducible configs, team workflows | 1x A10G+ |
| LLaMA Factory | RLHF/DPO alignment, 100+ models | 1+ GPUs |
| HF TRL | DPO/PPO preference optimization | 1+ GPUs |
| PEFT | LoRA adapters for any HF model | 1x T4 (free) |
| DeepSpeed | Multi-GPU distributed training | 2-8 GPUs |
QLoRA with Unsloth is the fastest path — trains an 8B model on a free T4 in about 30 minutes.
Alignment methods:
- SFT (Supervised Fine-Tuning) — learn from examples
- DPO (Direct Preference Optimization) — learn from preference pairs
- PPO (Proximal Policy Optimization) — learn from reward signals
- KTO (Kahneman-Tversky Optimization) — binary good/bad feedback
Step 4: Select Training Environment
| Platform | GPU | Cost | Setup |
|---|---|---|---|
| Google Colab | T4 16GB | Free | Paste script, run cells |
| Kaggle | P100 16GB | Free (30hr/week) | Upload notebook |
| Local | Your GPU(s) | Free | Run script directly |
| RunPod | A100 80GB | $1-4/hr | SSH, run script |
| SageMaker | A10G-A100 | $1-30/hr | Studio notebook |
Step 5: Quantize
Compress your model for efficient deployment:
| Method | Size (8B model) | Quality Loss | CPU? | GPU? |
|---|---|---|---|---|
| GGUF Q4_K_M | ~4.5 GB | Minimal | Yes | Yes |
| GGUF Q5_K_M | ~5.5 GB | Very small | Yes | Yes |
| GPTQ 4-bit | ~4 GB | Small | No | Yes |
| AWQ 4-bit | ~4 GB | Small | No | Yes |
| FP16 | ~16 GB | None | No | Yes |
Recommendation: GGUF Q4_K_M for maximum compatibility (runs on CPU and GPU, works with Ollama and llama.cpp).
Step 6: Deploy Inference
| Backend | Best For | API Compatible |
|---|---|---|
| Ollama | Easiest setup, local dev | OpenAI-compatible |
| vLLM | Fastest GPU serving, production | OpenAI-compatible |
| llama.cpp | CPU+GPU, edge, IoT | OpenAI-compatible |
| TGI | Docker-ready, HuggingFace models | Custom API |
Deploy targets: local process, Docker container, Kubernetes, or edge device (Jetson, Raspberry Pi).
Step 7: Review and Launch
The wizard generates a complete bash script. Copy it and run in your terminal or notebook. The script covers:
- Environment setup (
pip install) - Data preparation commands
- Fine-tuning code (library-specific)
- Quantization commands
- Inference server launch
- Docker packaging (if selected)
- Kubernetes YAML (if selected)
- VibeCody connection command
Example: Fine-Tune a Code Assistant on Colab (Free)
Complete walkthrough using the free Colab T4 GPU:
# In VibeCody CLI — prepare data
vibecli
> /train dataset from-codebase --format chatml --output train.jsonl
> /train dataset validate --file train.jsonl
> /wizard # Copy the generated script
Paste this into a Google Colab notebook:
# Cell 1: Setup
!pip install unsloth transformers datasets accelerate bitsandbytes
import torch
print(f'GPU: {torch.cuda.get_device_name(0)}')
# Cell 2: Load and prepare model
from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained(
"meta-llama/Llama-3.1-8B-Instruct",
max_seq_length=2048,
load_in_4bit=True
)
model = FastLanguageModel.get_peft_model(model, r=16, lora_alpha=16)
# Cell 3: Train
from trl import SFTTrainer
from transformers import TrainingArguments
from datasets import load_dataset
# Upload train.jsonl to Colab first
dataset = load_dataset('json', data_files='train.jsonl')
trainer = SFTTrainer(
model=model, tokenizer=tokenizer,
train_dataset=dataset['train'],
args=TrainingArguments(
output_dir='./output',
per_device_train_batch_size=4,
num_train_epochs=3,
learning_rate=2e-4,
)
)
trainer.train()
# Cell 4: Save
model.save_pretrained('./output')
tokenizer.save_pretrained('./output')
# Cell 5: Quantize to GGUF (download llama.cpp converter)
!pip install llama-cpp-python
# Export and download the GGUF file
Then deploy locally:
# Create Ollama model from GGUF
ollama create my-code-assistant -f Modelfile
# Use in VibeCody
vibecli --provider ollama --model my-code-assistant
> /chat Explain this codebase to me
VibeUI Model Wizard
For the full interactive experience, open the Model Wizard tab in VibeUI’s AI panel. It provides:
- Visual step-by-step form with option cards
- Auto-calculated VRAM requirements
- Library-specific code generation
- One-click script copy
- Config summary at each step
The AI/ML Workflow tab provides the big-picture view of the full pipeline with 5 end-to-end example workflows.