Roadmap

A note on cost, first

ZO v1 is optimised for production-grade research engineering. Agents iterate, verify, and write production-quality code — and that burns tokens. A canonical reference run with the default profile is on the order of $10–15 of API credits. For cost-sensitive plans (Anthropic Pro, student tier, individual researchers on a budget), the --low-token preset currently delivers ~30% measured savings on the same workload. We’re actively pushing that further — details below.

What we’re working on

Three workstreams shape the next 1–2 quarters, driven by recent user feedback on cost, accessibility, and cross-phase autonomy. Grouped by horizon: Now, Next quarter, and Major design.

Now (next 2–4 weeks)

Caveman integration

Bundle the caveman terse-output skill into --low-token mode, scoped to agents that don’t emit structured artifacts (data work, summarisation; not the oracle or training-callback paths). Target: another 10–20pp on top of the measured ~30% savings. First validation of our planned-integrations promise.

Onboarding hardening

One-line installer (curl … | bash), Homebrew formula, and a new zo doctor command for auto-detect-and-fix on missing dependencies. Reduces the setup-friction wall before we invest in a full UI.

Next quarter (1–3 months)

Cross-phase backedge primitive — DAG to graph, step one

Today’s orchestrator behaves like a directed acyclic graph: phases run forward, gate-by-gate. When an issue surfaces during modelling that traces back to preprocessing — for example, a flipped waveform polarity that’s invisible in raw data but obvious in trained-model behaviour — it requires human steering to revisit earlier phases.What ships: a new RETURN_TO_PHASE gate decision lets a later-phase agent autonomously request a revisit of any earlier phase, with budget guardrails (--max-cross-phase-revisits N), dead-end detection, and a revisit-reason payload that re-prompts the earlier phase with the new constraint. Validates the weighted-graph hypothesis cheaply, before the full refactor.

VS Code extension — agent dashboard, comms feed, gate approval

The CLI-first experience suits software engineers. Researchers from other disciplines often hit friction with environment variables, Node.js, and tmux. The answer: a VS Code extension (not a fork) that gives ZO a visual home alongside the editor, file tree, git, and integrated terminal researchers already have.Architecture: headless tmux as the engine, xterm.js webview as the renderer. Claude Code keeps its real TTY (zero regression risk on the orchestration core); the UI gets agent status panels, a comms feed, training-metrics dashboards, and one-click gate approval — all in one window with the user’s code. A “ZO Studio” bundled installer (VS Code + extension + Claude CLI + uv, one-click per platform) is on the table as a follow-on for non-technical researchers.

Push --low-token toward 70–80% savings

Current preset gives ~30% measured savings (see cost benchmark). The path higher is architectural: prompt caching via direct Anthropic SDK integration (replacing the claude CLI subprocess), Batch API for non-real-time phases, Files API for shared context across the team. Multi-week effort, gated behind the SDK refactor.

Major design (3–6+ months)

Full weighted graph architecture

Phases as nodes with weighted reversible edges. Each node carries a “completeness” probability that updates as work progresses; the system ranks revisit weights and runs a prioritised ablation queue. Users can freeze any node manually if happy.Enables a fast, loose end-to-end first pass — then the system autonomously revisits earlier nodes with better project context, instead of requiring human steering. Gated behind the cross-phase backedge primitive landing first.

Broader coding-agent provider support

Today ZO orchestrates Claude Code. Adding adapters for other coding-agent runtimes — OpenAI Codex CLI, OpenCode, others — would let users bring their preferred provider and hedge single-vendor risk.Realistic scope: worker-pool model (Claude Code lead dispatches single-agent jobs to alternate workers), not full peer-to-peer team parity. Full parity would mean rebuilding Claude Code’s native TeamCreate / Agent / SendMessage infrastructure on top of bare APIs.

Where to weigh in

These priorities come from real user feedback. If you have a use case one of these would unlock — or a different priority you’d argue for — open an issue or join the discussion.

Issues

Bug reports, feature requests, prioritisation discussion.

Discussions

Use-cases, design questions, “would ZO work for…” threads.

Get started

Concepts

CLI reference

Reference

Project

A note on cost, first

What we’re working on

Now (next 2–4 weeks)

Caveman integration

Onboarding hardening

Next quarter (1–3 months)

Cross-phase backedge primitive — DAG to graph, step one

VS Code extension — agent dashboard, comms feed, gate approval

Push --low-token toward 70–80% savings

Major design (3–6+ months)

Full weighted graph architecture

Broader coding-agent provider support

Where to weigh in

Issues

Discussions

​A note on cost, first

​What we’re working on

​Now (next 2–4 weeks)

Caveman integration

Onboarding hardening

​Next quarter (1–3 months)

Cross-phase backedge primitive — DAG to graph, step one

VS Code extension — agent dashboard, comms feed, gate approval

Push --low-token toward 70–80% savings

​Major design (3–6+ months)

Full weighted graph architecture

Broader coding-agent provider support

​Where to weigh in

Issues

Discussions

A note on cost, first

What we’re working on

Now (next 2–4 weeks)

Next quarter (1–3 months)

Major design (3–6+ months)

Where to weigh in