The four memory files
Every project gets amemory/ directory with four canonical files (or .zo/memory/ in the portable layout):
STATE.md
Current phase, last checkpoint, agent statuses, blockers, next steps. Overwritten each session-end. Atomic-write protected.
DECISION_LOG.md
Append-only audit trail. Every architectural decision, gate passage, scope change. Each entry has a timestamp, type, title, decision, rationale, alternatives considered, outcome.
PRIORS.md
Domain knowledge accumulated through running ZO. Each prior references the failure that triggered it. ZO has 40+ documented priors, each tracing to a real failure.
sessions/
Per-session summary files (
session-NNN-YYYY-MM-DD.md). Written at session end. Captures what was attempted, what shipped, what’s next.Portable memory: .zo/
Project memory lives in the delivery repo, not the ZO repo. This is by design:
- The ZO repo is public (open source). It can’t contain client/project state.
- The delivery repo is private, committed to git, already travels between machines.
git pullon a new machine brings code AND state.
memory/zo-platform/) is the only memory tracked in the public ZO repo. It captures what ZO learned generically, never project specifics. See PR-024 / PR-028 / PR-030 for the confidentiality enforcement story.
Cross-machine: zo continue --repo
Move a project from a Mac dev box to a Linux GPU server:
.zo/ layout, loads project context, and resumes from the recorded phase. Machine-specific paths (data location, GPU details) are re-detected via zo.environment.detect_environment() and written to .zo/local.yaml (gitignored).
Concurrent sessions: zo report
You can run a second ZO session on the same project while a model run is going, without racing on shared state. zo report launches an Opus report-lead in an isolated git worktree (on a report/<id> branch) that:
- reads canonical
.zo/memoryas a snapshot and.zo/experimentslive from disk, - verifies results (
oracle-qa) and data (data-engineer), and writes the LaTeX report into its worktree, - records its own decisions / priors / summary to a delta store at
.zo/surrogates/<id>/— never touching canonicalSTATE.mdor the experiment registry.
zo consolidate), the surrogate’s memory is folded back into .zo/memory (decisions appended, priors de-duplicated, summaries copied) and its branch is merged — but only when no other session is live. The memory fold is flock-guarded, and --bypass-permissions inherits an already-active overlay instead of clobbering it.
--no-consolidate covers the one transitional case where the model session started before this feature (it isn’t in the per-PID liveness registry, so auto-merge can’t see it). Future model sessions register themselves and consolidation is automatic on last-session-close.
Semantic search
DECISION_LOG.md accumulates rapidly, a long-running project might have 200+ decision entries. The semantic index lets agents query in natural language:
src/zo/semantic.pyembeds eachDECISION_LOGentry (1 vector per entry, summary derived from title + outcome)- Queries cosine-match against the summary embedding
- The full entry is injected into context (not just the summary)
- Storage: SQLite at
{memory_root}/index.db - Embeddings:
fastembed(optional dependency; falls back to word-overlap if missing, see DECISION_LOG entry from session 2)
The session lifecycle
Session start
Agent reads
STATE.md (current phase, blockers). Queries semantic index for relevant past decisions. Loads only the spec files needed for the current task.Work
Agents execute. Every architectural decision is appended to
DECISION_LOG.md immediately (not batched). Comms events log to JSONL.Failure → prior
If anything fails, the post-mortem protocol fires: document the failure, classify root cause (
missing_rule / incomplete_rule / ignored_rule / novel_case / regression), fix the symptom, update the rule that allowed it, verify the fix prevents recurrence.Phase completion
At every gate PROCEED, a
PhaseSnapshot is written to {memory_root}/snapshots/. Captures the phase’s full context for the next-phase Lead.Phase-aware context resets
Planning, building, and maintenance are separate conversation contexts. When transitioning from planning to building, the orchestrator closes the planning context and opens a fresh building context, loading only:STATE.md(current state)DECISION_LOG.md(recent + semantic-matched older decisions)PRIORS.md(relevant priors)- The previous phase’s snapshot
Self-evolution in practice
The 40+ priors inmemory/zo-platform/PRIORS.md are the cumulative output of this protocol. A few examples:
- PR-001:
claude --print --dangerously-skip-permissionsexits immediately. Captured after a tmux pane stayed blank during MNIST testing. - PR-005: Aspirational rules without enforcement are dead letter. Captured after a documentation cascade was repeatedly ignored despite being written in CLAUDE.md.
- PR-028: Project memory belongs in the delivery repo, not the platform repo. Captured after a Mac → GPU server transfer broke
zo status. - PR-034: PyTorch MPS tensor extraction returns garbage under pytest. Captured after CIFAR-10 oracle tests failed mysteriously despite the same code working in training.
Per-project priors: seed → load → learn → promote
Priors aren’t only hand-written. ZO maintains a project’sPRIORS.md automatically across a run:
- Seed — at first session the plan’s
## Domain Context and Priorsare written into the projectPRIORS.md, so the team starts with the human’s domain knowledge instead of a blank slate. - Load — every Lead prompt injects the current project priors (“accumulated learnings — honor these before repeating past mistakes”), so agents see them before acting.
- Learn — when the autonomous Phase-4 loop hits a dead-end or plateau, the orchestrator records the failure and appends a durable
auto-learningprior, so the next iteration (or a later session) doesn’t repeat it. - Promote — generic learnings can graduate to the platform with
zo learnings promote --project NAME --repo PATH. It is fail-closed: only generic-category priors that clear the client blocklist are promoted; anything project-specific or matching a client identifier is blocked and reported, never auto-rewritten. With no blocklist configured, nothing is promoted.
Next
zo continue
Resume a paused project on the same or a different machine.
Self-evolution protocol
The full post-mortem and rule-update protocol.