Model Catalog
Model Catalog & Routing Guide
Section titled “Model Catalog & Routing Guide”Full reference for model selection. See the Agent Roster for the per-agent quick summary.
Coding Complexity Tiers
Section titled “Coding Complexity Tiers”| Tier | Complexity | Examples | Route To |
|---|---|---|---|
| T1 — Routine | Single-file, clear spec | Bug fix, config edit, simple script, CSS tweak, CRUD endpoint, test additions | Codex CLI → Gemini CLI |
| T2 — Standard | Multi-file, well-defined scope | New feature (clear spec), module refactor, API additions, pipeline updates, scraper fixes | Codex CLI → Gemini CLI → Sonnet (if they fail) |
| T3 — Complex | Architectural, ambiguous, multi-service | Design system work, cross-service coordination, large refactors, ambiguous debugging | Opus 4.6 |
| T4 — Critical | Security, QA, production review | Security audit, production code review, novel problem-solving | Opus 4.6 |
Escalation rule: T1 fails → retry with other free tool → T2. T2 free tools fail → Sonnet. Sonnet fails or task is ambiguous → Opus. Never skip tiers unless clearly T3/T4 from the start.
Non-Coding Task Routing
Section titled “Non-Coding Task Routing”| Task Type | Primary | Fallback | Notes |
|---|---|---|---|
| Research & analysis | Gemini 3.1 Pro (free) | Opus 4.6 (deep scientific) | GPQA 94.3%, ARC-AGI 77.1%, 1M ctx |
| Writing & docs | Sonnet 4.6 | Opus 4.6 (creative/strategic) | GDPval 1633 (Sonnet leads) |
| Orchestration & planning | Opus 4.6 | Gemini 3.1 Pro (tool coordination) | τ²-bench 91.9%, MCP Atlas 69.2% |
| Data processing | Gemini 3.1 Pro | Sonnet 4.6 (pipeline code) | LiveCodeBench 2887, 1M ctx |
| Config & infrastructure | GPT-5.3 Codex (free) | Sonnet 4.6 (IaC generation) | Terminal-Bench 77.3% — leads by 9pp |
| Code review / QA | Opus 4.6 | Gemini 3.1 Pro (repo-wide audit) | GPQA 91.3%, best security audit |
| Heartbeats / light ops | MiniMax M2.5 (free) | — | ONLY for well-defined, low-stakes tasks |
Model Strings for Spawning
Section titled “Model Strings for Spawning”Updated 2026-03-01
| Model | Provider String | Cost | Best For |
|---|---|---|---|
| Opus 4.6 | anthropic/claude-opus-4-6 | Max plan (weekly limit) | T3/T4 coding, orchestration, code review, security |
| Sonnet 4.6 | anthropic/claude-sonnet-4-6 | Max plan (weekly limit) | T2 fallback, writing, frontend, debugging |
| GPT-5.3 Codex | openai-codex/gpt-5.3-codex | Free (Plus plan) | T1/T2 coding, config/infra, terminal work |
| Gemini 3.1 Pro | google/gemini-3.1-pro-preview | Free tier (rate limits) | Research, data processing, large codebase |
| Gemini 3.1 Pro (OAuth) | google-antigravity/gemini-3.1-pro-preview | Free (OAuth) | Same as above, higher rate limits |
| Gemini 3 Flash | google-antigravity/gemini-3-flash | Free | High-volume triage, budget tasks |
| MiniMax M2.5 | minimax-portal/MiniMax-M2.5 | Free (OAuth) | Heartbeats only — 88% hallucination rate |
MiniMax M2.5 — Use With Caution
Section titled “MiniMax M2.5 — Use With Caution”SWE-Bench 80.2% is real but narrow. 88% hallucination rate (Artificial Analysis). AA Intelligence Index: 42.
- ✅ Heartbeats, narrow auto-tested coding, high-volume tool calling (BFCL 76.8%)
- ❌ NEVER for: research, code review, debugging, unsupervised work, anything requiring accuracy
- Default coding to free tools first. Only escalate to Max plan when free tools fail.
- Do NOT use OpenRouter for sub-agents — credits are limited.
- Do NOT use MiniMax for anything requiring accuracy.
Escalation Chain (cost order)
Section titled “Escalation Chain (cost order)”Codex CLI (free) → Gemini CLI (free) → Sonnet sub-agent (Max plan) → Opus (Max plan, premium)CLI Syntax
Section titled “CLI Syntax”# Codex CLI — free via ChatGPT Plus. Terminal-Bench 77.3%. Best for T1/T2.npx @openai/codex exec "task. Fix errors. Commit. Do NOT ask questions."
# Gemini CLI — free. 1M+ context. Good for T1/T2 + research.gemini -p "task" -y
# Claude Code CLI — Max plan. Use for T2 (after free tools fail) and T3.claude -p --dangerously-skip-permissions --model sonnet --max-budget-usd 3 "task. Build. Fix errors. Commit."See Also
Section titled “See Also”- Agent Roster — Who uses which model
- Spawn SOP — How to spawn with correct model
- Providers — Provider auth status and quotas