Skip to content

Model Catalog

Model Catalog & Routing Guide

Full reference for model selection. See the Agent Roster for the per-agent quick summary.

Coding Complexity Tiers

Tier	Complexity	Examples	Route To
T1 — Routine	Single-file, clear spec	Bug fix, config edit, simple script, CSS tweak, CRUD endpoint, test additions	Codex CLI → Gemini CLI
T2 — Standard	Multi-file, well-defined scope	New feature (clear spec), module refactor, API additions, pipeline updates, scraper fixes	Codex CLI → Gemini CLI → Sonnet (if they fail)
T3 — Complex	Architectural, ambiguous, multi-service	Design system work, cross-service coordination, large refactors, ambiguous debugging	Opus 4.6
T4 — Critical	Security, QA, production review	Security audit, production code review, novel problem-solving	Opus 4.6

Escalation rule: T1 fails → retry with other free tool → T2. T2 free tools fail → Sonnet. Sonnet fails or task is ambiguous → Opus. Never skip tiers unless clearly T3/T4 from the start.

Non-Coding Task Routing

Task Type	Primary	Fallback	Notes
Research & analysis	Gemini 3.1 Pro (free)	Opus 4.6 (deep scientific)	GPQA 94.3%, ARC-AGI 77.1%, 1M ctx
Writing & docs	Sonnet 4.6	Opus 4.6 (creative/strategic)	GDPval 1633 (Sonnet leads)
Orchestration & planning	Opus 4.6	Gemini 3.1 Pro (tool coordination)	τ²-bench 91.9%, MCP Atlas 69.2%
Data processing	Gemini 3.1 Pro	Sonnet 4.6 (pipeline code)	LiveCodeBench 2887, 1M ctx
Config & infrastructure	GPT-5.3 Codex (free)	Sonnet 4.6 (IaC generation)	Terminal-Bench 77.3% — leads by 9pp
Code review / QA	Opus 4.6	Gemini 3.1 Pro (repo-wide audit)	GPQA 91.3%, best security audit
Heartbeats / light ops	MiniMax M2.5 (free)	—	ONLY for well-defined, low-stakes tasks

Model Strings for Spawning

Updated 2026-03-01

Model	Provider String	Cost	Best For
Opus 4.6	`anthropic/claude-opus-4-6`	Max plan (weekly limit)	T3/T4 coding, orchestration, code review, security
Sonnet 4.6	`anthropic/claude-sonnet-4-6`	Max plan (weekly limit)	T2 fallback, writing, frontend, debugging
GPT-5.3 Codex	`openai-codex/gpt-5.3-codex`	Free (Plus plan)	T1/T2 coding, config/infra, terminal work
Gemini 3.1 Pro	`google/gemini-3.1-pro-preview`	Free tier (rate limits)	Research, data processing, large codebase
Gemini 3.1 Pro (OAuth)	`google-antigravity/gemini-3.1-pro-preview`	Free (OAuth)	Same as above, higher rate limits
Gemini 3 Flash	`google-antigravity/gemini-3-flash`	Free	High-volume triage, budget tasks
MiniMax M2.5	`minimax-portal/MiniMax-M2.5`	Free (OAuth)	Heartbeats only — 88% hallucination rate

MiniMax M2.5 — Use With Caution

SWE-Bench 80.2% is real but narrow. 88% hallucination rate (Artificial Analysis). AA Intelligence Index: 42.

✅ Heartbeats, narrow auto-tested coding, high-volume tool calling (BFCL 76.8%)
❌ NEVER for: research, code review, debugging, unsupervised work, anything requiring accuracy

Rules

Default coding to free tools first. Only escalate to Max plan when free tools fail.
Do NOT use OpenRouter for sub-agents — credits are limited.
Do NOT use MiniMax for anything requiring accuracy.

Escalation Chain (cost order)

Codex CLI (free) → Gemini CLI (free) → Sonnet sub-agent (Max plan) → Opus (Max plan, premium)

CLI Syntax

# Codex CLI — free via ChatGPT Plus. Terminal-Bench 77.3%. Best for T1/T2.
npx @openai/codex exec "task. Fix errors. Commit. Do NOT ask questions."

# Gemini CLI — free. 1M+ context. Good for T1/T2 + research.
gemini -p "task" -y

# Claude Code CLI — Max plan. Use for T2 (after free tools fail) and T3.
claude -p --dangerously-skip-permissions --model sonnet --max-budget-usd 3 "task. Build. Fix errors. Commit."

See Also

Agent Roster — Who uses which model
Spawn SOP — How to spawn with correct model
Providers — Provider auth status and quotas