Skip to main content
A one-page reference for the --low-token cost-saving preset. For motivation, trade-offs, and FAQ, see the low-token mode concept page.
Measured savings: ~30% on the first MNIST bench ($7.75 vs. ~$11 default, 2026-04-27). The preset since added two structural levers, Haiku routing for code-reviewer / test-engineer / oracle-qa and per-phase agent drops (Phase 1 and Phase 5), which target the ~50-60% ceiling. A second bench post-PR-C is required to confirm the new measured number; this page updates when the next bench lands. Full breakdown: Cost benchmark.

Activation

MethodWherePersistence
--low-token flagzo build, zo continuePer-invocation
low_token: truePlan YAML frontmatterPer project
When either is on, the preset below applies. CLI flag wins over plan field if both are set.

The preset

_LOW_TOKEN_PRESET = {
    "lead_model": "sonnet",
    "max_iterations": 2,
    "stop_on_tier": "could_pass",
    "drop_research_scout": True,
    "headlines_disabled": True,
    "gate_mode": "full-auto",
    "compact_threshold": "60",
}

# Plus two routing tables in src/zo/_orchestrator_phases.py:

LOW_TOKEN_HAIKU_AGENTS = frozenset({
    "code-reviewer", "test-engineer", "oracle-qa",
})

LOW_TOKEN_PHASE_DROPS = {
    "phase_1": frozenset({"code-reviewer", "test-engineer", "domain-evaluator"}),
    "phase_5": frozenset({"xai-agent", "domain-evaluator"}),
}
Authoritative locations: src/zo/cli.py for _LOW_TOKEN_PRESET, src/zo/_orchestrator_phases.py for the routing tables.

Knob reference

KnobDefaultLow-tokenWhere in code
Lead modelopussonnet_resolve_lead_model in cli.py
Sub-agent routingper .md (mostly Sonnet)two-tier (Haiku for review/test/oracle, Sonnet for reasoning)_prompt_low_token_overrides in orchestrator.py
Phase 1 trimfull reviewer setdata-engineer only (reviewers/tests deferred to Gate 5)LOW_TOKEN_PHASE_DROPS["phase_1"]
Phase 5 trimmodel-builder + oracle-qa + xai-agent + domain-evaluatormodel-builder + oracle-qa only (lead writes single-shot summary)LOW_TOKEN_PHASE_DROPS["phase_5"]
Phase-4 max_iterations102_LOW_TOKEN_LOOP_CLAMPS in experiment_loop.py
Phase-4 stop_on_tiermust_passcould_pass_LOW_TOKEN_LOOP_CLAMPS in experiment_loop.py
Cross-cutting research-scoutenableddisabled_agents_for_phase in orchestrator.py
Cross-cutting code-reviewerenableddropped from Phase 1; on Haiku elsewhereLOW_TOKEN_PHASE_DROPS + LOW_TOKEN_HAIKU_AGENTS
End-of-session Haiku summarygeneratedskipped_generate_session_summary in cli.py. (The previous per-60-second _maybe_print_headline ticker was removed unconditionally; only the one-shot wrap-up call remains, and --low-token skips even that.)
Default gate modesupervisedfull-auto_resolve_gate_mode in cli.py
Lead prompt: dedicated adaptations sectionincludedomittedbuild_lead_prompt in orchestrator.py
Lead prompt: rosterfull descriptivecompact comma-list_prompt_roster in orchestrator.py
CLAUDE_AUTOCOMPACT_PCT_OVERRIDE envunset60extra_env in cli.py → wrapper

Override flags

These compose with --low-token to fine-tune individual knobs:
FlagTypeEffect
--lead-model {opus,sonnet,haiku}choiceOverride the lead model
--max-iterations NintHard cap on Phase-4 iterations
--no-headlinesboolSkip the end-of-session Haiku bullet summary (independent of low-token)
--gate-mode {supervised,auto,full-auto}choiceOverride the gate mode

Plan-level fields

Two YAML frontmatter fields complement the CLI flags:
---
project_name: "..."
# ... required fields ...
low_token: true        # optional, default false
lead_model: sonnet     # optional, default unset
---
Plan-level ## Experiment Loop section overrides individual loop fields with full granularity:
## Experiment Loop

max_iterations: 5
plateau_epsilon: 0.005
stop_on_tier: should_pass
When low_token is on AND a plan ## Experiment Loop field is set, the plan field wins (the preset is a “sensible defaults” layer, not a hard clamp).

Precedence (highest first)

  1. CLI flag: --lead-model, --max-iterations, --gate-mode, --no-headlines
  2. Plan YAML frontmatter: lead_model, low_token
  3. Plan body section: ## Experiment Loop for loop knobs
  4. Low-token preset: applied when --low-token or low_token: true is set
  5. Base default: Opus, 10 iterations, supervised, etc.
Concrete examples:
InvocationPlan hasEffective leadEffective max-iterations
zo build x.md(nothing relevant)opus10
zo build x.md --low-token(nothing relevant)sonnet2
zo build x.md --low-tokenlead_model: opusopus (plan wins over preset)2
zo build x.md --low-token --lead-model haikulead_model: opushaiku (CLI wins over plan)2
zo build x.md --low-token## Experiment Loop
max_iterations: 8
sonnet8 (plan wins over preset clamp)
zo build x.md --low-token --max-iterations 4## Experiment Loop
max_iterations: 8
sonnet4 (CLI wins over plan)

Visual confirmation

When low-token is active, the ZO banner shows a [low-token] badge:
╭──────────────────────────────────────────────────────────────╮
│   ◎ Zero Operators  v1.0.2  [low-token]                     │
│      Autonomous AI Research & Engineering Teams              │
│                                                              │
│   Project:   mnist-demo                                      │
│   Mode:      build                                           │
│   Phase:     starting                                        │
│   Gates:     full-auto                                       │
╰──────────────────────────────────────────────────────────────╯

See also