Tutorial: Model Wizard — Fine-Tune to Deployment

Build, fine-tune, quantize, and deploy a custom AI model in 7 steps using VibeCody’s Model Wizard.

Prerequisites: VibeCody installed, a training dataset or codebase, and either a free Colab account or a local GPU.

Quick Start (5 minutes)

The fastest path from zero to a running custom model:

# 1. Extract training data from your codebase
vibecli
> /train dataset from-codebase --format chatml --output data.jsonl

# 2. Fine-tune with Unsloth on free Colab GPU
# (Copy the generated script from /wizard into a Colab notebook)
> /wizard

# 3. Quantize the result
> /inference quantize --method gguf-q4km --model ./output

# 4. Deploy locally
> /inference deploy --backend ollama --model ./model-q4.gguf

# 5. Use your model in VibeCody
> /model ollama my-model
> /chat Hello from my custom model!

The 7 Steps in Detail

Step 1: Choose a Base Model

Pick a foundation model to fine-tune. Smaller models train faster and cheaper:

Model	Size	Free Colab?	Best For
Llama 3.2 3B	3B	Yes (T4)	IoT, edge, fast inference
Llama 3.1 8B	8B	Yes (T4 with QLoRA)	General purpose, coding
Mistral 7B	7B	Yes (T4)	Instruction following
Phi 3.5 Mini	3.8B	Yes (T4)	Small, fast, surprisingly capable
Qwen 2.5 Coder 7B	7B	Yes (T4)	Code generation
DeepSeek R1 Distill 7B	7B	Yes (T4)	Reasoning
Mixtral 8x7B (MoE)	47B	No (needs A100)	Mixture of Experts, high quality

Recommendation: Start with Llama 3.1 8B — best quality-to-cost ratio, runs on free Colab.

Step 2: Prepare Your Dataset

Four ways to create training data:

From your codebase (extracts function docs + implementations):

> /train dataset from-codebase --format chatml --output data.jsonl

From git history (commit messages + diffs):

> /train dataset from-git --max-commits 5000 --format chatml --output data.jsonl

From documents (PDF, Markdown, HTML — uses MinerU for complex PDFs):

# For simple docs:
> /ingest ./docs --output data.jsonl

# For scientific PDFs with formulas/tables:
pip install magic-pdf[full]
magic-pdf -p ./papers/ -o ./parsed/ -m auto
> /ingest ./parsed/ --output data.jsonl

Existing dataset (use a pre-prepared JSONL file):

> /train dataset validate --file my-data.jsonl

Step 3: Configure Fine-Tuning

Six open-source libraries, each with different strengths:

Library	Best For	GPU Needed
Unsloth	Single GPU, Colab, fastest setup	1x T4 (free)
Axolotl	Reproducible configs, team workflows	1x A10G+
LLaMA Factory	RLHF/DPO alignment, 100+ models	1+ GPUs
HF TRL	DPO/PPO preference optimization	1+ GPUs
PEFT	LoRA adapters for any HF model	1x T4 (free)
DeepSpeed	Multi-GPU distributed training	2-8 GPUs

QLoRA with Unsloth is the fastest path — trains an 8B model on a free T4 in about 30 minutes.

Alignment methods:

SFT (Supervised Fine-Tuning) — learn from examples
DPO (Direct Preference Optimization) — learn from preference pairs
PPO (Proximal Policy Optimization) — learn from reward signals
KTO (Kahneman-Tversky Optimization) — binary good/bad feedback

Step 4: Select Training Environment

Platform	GPU	Cost	Setup
Google Colab	T4 16GB	Free	Paste script, run cells
Kaggle	P100 16GB	Free (30hr/week)	Upload notebook
Local	Your GPU(s)	Free	Run script directly
RunPod	A100 80GB	$1-4/hr	SSH, run script
SageMaker	A10G-A100	$1-30/hr	Studio notebook

Step 5: Quantize

Compress your model for efficient deployment:

Method	Size (8B model)	Quality Loss	CPU?	GPU?
GGUF Q4_K_M	~4.5 GB	Minimal	Yes	Yes
GGUF Q5_K_M	~5.5 GB	Very small	Yes	Yes
GPTQ 4-bit	~4 GB	Small	No	Yes
AWQ 4-bit	~4 GB	Small	No	Yes
FP16	~16 GB	None	No	Yes

Recommendation: GGUF Q4_K_M for maximum compatibility (runs on CPU and GPU, works with Ollama and llama.cpp).

Step 6: Deploy Inference

Backend	Best For	API Compatible
Ollama	Easiest setup, local dev	OpenAI-compatible
vLLM	Fastest GPU serving, production	OpenAI-compatible
llama.cpp	CPU+GPU, edge, IoT	OpenAI-compatible
TGI	Docker-ready, HuggingFace models	Custom API

Deploy targets: local process, Docker container, Kubernetes, or edge device (Jetson, Raspberry Pi).

Step 7: Review and Launch

The wizard generates a complete bash script. Copy it and run in your terminal or notebook. The script covers:

Environment setup (pip install)
Data preparation commands
Fine-tuning code (library-specific)
Quantization commands
Inference server launch
Docker packaging (if selected)
Kubernetes YAML (if selected)
VibeCody connection command

Example: Fine-Tune a Code Assistant on Colab (Free)

Complete walkthrough using the free Colab T4 GPU:

# In VibeCody CLI — prepare data
vibecli
> /train dataset from-codebase --format chatml --output train.jsonl
> /train dataset validate --file train.jsonl
> /wizard   # Copy the generated script

Paste this into a Google Colab notebook:

# Cell 1: Setup
!pip install unsloth transformers datasets accelerate bitsandbytes
import torch
print(f'GPU: {torch.cuda.get_device_name(0)}')

# Cell 2: Load and prepare model
from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained(
    "meta-llama/Llama-3.1-8B-Instruct",
    max_seq_length=2048,
    load_in_4bit=True
)
model = FastLanguageModel.get_peft_model(model, r=16, lora_alpha=16)

# Cell 3: Train
from trl import SFTTrainer
from transformers import TrainingArguments
from datasets import load_dataset

# Upload train.jsonl to Colab first
dataset = load_dataset('json', data_files='train.jsonl')
trainer = SFTTrainer(
    model=model, tokenizer=tokenizer,
    train_dataset=dataset['train'],
    args=TrainingArguments(
        output_dir='./output',
        per_device_train_batch_size=4,
        num_train_epochs=3,
        learning_rate=2e-4,
    )
)
trainer.train()

# Cell 4: Save
model.save_pretrained('./output')
tokenizer.save_pretrained('./output')

# Cell 5: Quantize to GGUF (download llama.cpp converter)
!pip install llama-cpp-python
# Export and download the GGUF file

Then deploy locally:

# Create Ollama model from GGUF
ollama create my-code-assistant -f Modelfile

# Use in VibeCody
vibecli --provider ollama --model my-code-assistant
> /chat Explain this codebase to me

VibeUI Model Wizard

For the full interactive experience, open the Model Wizard tab in VibeUI’s AI panel. It provides:

Visual step-by-step form with option cards
Auto-calculated VRAM requirements
Library-specific code generation
One-click script copy
Config summary at each step

The AI/ML Workflow tab provides the big-picture view of the full pipeline with 5 end-to-end example workflows.