Skip to content

Model Catalog

Full reference for model selection. See the Agent Roster for the per-agent quick summary.

TierComplexityExamplesRoute To
T1 — RoutineSingle-file, clear specBug fix, config edit, simple script, CSS tweak, CRUD endpoint, test additionsCodex CLI → Gemini CLI
T2 — StandardMulti-file, well-defined scopeNew feature (clear spec), module refactor, API additions, pipeline updates, scraper fixesCodex CLI → Gemini CLI → Sonnet (if they fail)
T3 — ComplexArchitectural, ambiguous, multi-serviceDesign system work, cross-service coordination, large refactors, ambiguous debuggingOpus 4.6
T4 — CriticalSecurity, QA, production reviewSecurity audit, production code review, novel problem-solvingOpus 4.6

Escalation rule: T1 fails → retry with other free tool → T2. T2 free tools fail → Sonnet. Sonnet fails or task is ambiguous → Opus. Never skip tiers unless clearly T3/T4 from the start.

Task TypePrimaryFallbackNotes
Research & analysisGemini 3.1 Pro (free)Opus 4.6 (deep scientific)GPQA 94.3%, ARC-AGI 77.1%, 1M ctx
Writing & docsSonnet 4.6Opus 4.6 (creative/strategic)GDPval 1633 (Sonnet leads)
Orchestration & planningOpus 4.6Gemini 3.1 Pro (tool coordination)τ²-bench 91.9%, MCP Atlas 69.2%
Data processingGemini 3.1 ProSonnet 4.6 (pipeline code)LiveCodeBench 2887, 1M ctx
Config & infrastructureGPT-5.3 Codex (free)Sonnet 4.6 (IaC generation)Terminal-Bench 77.3% — leads by 9pp
Code review / QAOpus 4.6Gemini 3.1 Pro (repo-wide audit)GPQA 91.3%, best security audit
Heartbeats / light opsMiniMax M2.5 (free)ONLY for well-defined, low-stakes tasks

Updated 2026-03-01

ModelProvider StringCostBest For
Opus 4.6anthropic/claude-opus-4-6Max plan (weekly limit)T3/T4 coding, orchestration, code review, security
Sonnet 4.6anthropic/claude-sonnet-4-6Max plan (weekly limit)T2 fallback, writing, frontend, debugging
GPT-5.3 Codexopenai-codex/gpt-5.3-codexFree (Plus plan)T1/T2 coding, config/infra, terminal work
Gemini 3.1 Progoogle/gemini-3.1-pro-previewFree tier (rate limits)Research, data processing, large codebase
Gemini 3.1 Pro (OAuth)google-antigravity/gemini-3.1-pro-previewFree (OAuth)Same as above, higher rate limits
Gemini 3 Flashgoogle-antigravity/gemini-3-flashFreeHigh-volume triage, budget tasks
MiniMax M2.5minimax-portal/MiniMax-M2.5Free (OAuth)Heartbeats only — 88% hallucination rate

SWE-Bench 80.2% is real but narrow. 88% hallucination rate (Artificial Analysis). AA Intelligence Index: 42.

  • ✅ Heartbeats, narrow auto-tested coding, high-volume tool calling (BFCL 76.8%)
  • ❌ NEVER for: research, code review, debugging, unsupervised work, anything requiring accuracy
  • Default coding to free tools first. Only escalate to Max plan when free tools fail.
  • Do NOT use OpenRouter for sub-agents — credits are limited.
  • Do NOT use MiniMax for anything requiring accuracy.
Codex CLI (free) → Gemini CLI (free) → Sonnet sub-agent (Max plan) → Opus (Max plan, premium)
Terminal window
# Codex CLI — free via ChatGPT Plus. Terminal-Bench 77.3%. Best for T1/T2.
npx @openai/codex exec "task. Fix errors. Commit. Do NOT ask questions."
# Gemini CLI — free. 1M+ context. Good for T1/T2 + research.
gemini -p "task" -y
# Claude Code CLI — Max plan. Use for T2 (after free tools fail) and T3.
claude -p --dangerously-skip-permissions --model sonnet --max-budget-usd 3 "task. Build. Fix errors. Commit."