I've spent the last several months building something I call the Master OS: a custom AI operating system that orchestrates work across seven business domains — a music label, a content engine, an algorithmic trading system, an AI tooling company, and more. Nine git repositories. One session command. Hundreds of tickets dispatched, verified, and merged to date.
Phase 6, which went live on May 7 2026, is the most interesting version yet. Here's what the engineering actually taught me about running multi-agent systems in production.
Pattern 1: Hooks beat prompts for anything that must hold 100% of the time
The system uses Claude Code extensively. You can put instructions in
CLAUDE.md, rules files, agent descriptions — and the model generally follows
them. But "generally" isn't good enough for trust boundaries.
The breakthrough was separating advisory guidance (CLAUDE.md,
.claude/rules/*.md — read by the model, no runtime enforcement) from
deterministic hooks (PreToolUse JSON hooks, executed before every tool call).
Four hooks do the heavy lifting:
-
ticket-state-guard.json— blocks any direct write totickets.json. Every ticket state transition must go throughscripts/tickets.py, which validates it against a finite state machine. You cannot accidentally corrupt ticket state. -
pre-merge-gate.json— interceptsgit mergeon anyclaude/T-*branch. It fails the merge unless: verifier verdict ispass, pytest is green, ruff is clean,system_doctorreports A 19/19. Broken code cannot merge to main. -
boundary-guard.json+safety_guard.py— four-layer isolation for AIQuantify, the live trading system. Any one layer can fail independently; all four must fail simultaneously to breach the boundary. -
worktree-cleanup.json— a SessionStart hook that GCs stale worktrees (24h after merge, 3d after blocked, 1d orphan). Branch accumulation is a solved problem.
The mental model: hooks are invariants. CLAUDE.md is advice. If you want the
system to be correct, put the constraint in a hook.
Pattern 2: Agent contracts are authority delegation
Every agent in the system has a declared contract. The librarian writes only
wiki/<Brand>/. The operator-global writes only
wiki/_global/. The ticket-executor writes only to its worktree and
tickets.json (via the API, not directly). Scouts and distillers are
read-only — they return structured outputs for other agents to act on.
These aren't conventions. python3 scripts/system_doctor.py --grade validates
all eight agent contracts exist, have correct frontmatter, and are registered. A 19/19
checks. Deviation surfaces as a housekeeping ticket at the next session start.
The practical outcome: I can spawn a ticket-executor for a wiki-write task and trust that it will delegate the write to librarian rather than doing it directly. The authority chain is structural. There's no "oops I forgot to delegate" path.
This maps cleanly to how you'd design a human org: the CFO has write access to financials, the VP of Engineering has write access to the codebase, and neither can approve the other's domain. But in a software system, you don't need org charts — you need contracts validated by the runtime.
Pattern 3: The hybrid orchestration lesson (a real engineering constraint)
The original Phase 6 design spawned a dedicated Opus-class
session-orchestrator subagent to own the full session loop — handling phases
from scout dispatch through ticket approval through executor dispatch. Clean separation of
concerns on paper.
It was dead on arrival.
The hard constraint: Claude Code v2.1.x spawned subagents do not receive the
Task or AskUserQuestion tools regardless of what you declare in
tools: frontmatter. No Task means no parallel scout spawning.
No AskUserQuestion means no multi-choice interview. The subagent cannot do
the job.
The fix — now permanent doctrine — is to keep the session loop in the main session, which retains the full toolkit. The main session spawns subagents only for work where the restricted palette is sufficient: per-brand scouts (read-only), ticket-executors (Read/Write/Edit/Bash), verifiers (read + verdict), distillers (read + proposals). Each has a narrow scope that fits the constraints.
This is the Hybrid model: main session as orchestrator, subagents as specialists. It's not a workaround. It's the right design once you know the runtime constraint.
The broader lesson: architecture must follow runtime capability, not theory. Spend the first week of any new agentic framework mapping exactly what each agent type can and cannot do. Build your patterns from reality, not from what you wish were true.
Pattern 4: Distillation as a compounding mechanism
Every session produces artifacts: session logs, merged tickets, blocked tickets, research
notes. The first /start of each new calendar day runs a distillation phase
that converts yesterday's artifacts into durable wiki knowledge.
The pipeline: scripts/distill.py gather assembles DistillationPayloads per
brand → parallel distiller subagents classify candidates as high/medium/low confidence →
distill.py route determines target authority → librarian writes
wiki/<Brand>/ pages; operator-global writes
wiki/_global/ pages. Distillers never write directly — they classify and
propose. The write authority stays with the designated root agents.
High-confidence findings (verified-merged tickets, clean research) auto-promote. Medium-confidence surfaces in the next session interview for approval. Low-confidence stays in the daily ledger only.
The compound effect: each day the system gets smarter without any manual curation effort. The wiki grows from verified signal, not from every thought that crossed a model's context window. As of this writing: 336 pages of durable, cited, authority-tracked knowledge.
Most agentic systems forget. They start each session fresh from a system prompt. A compounding wiki is the antidote — but you need the pipeline to be correct enough that low-quality signal doesn't pollute the knowledge base. That's what the confidence classification does: it keeps junk out.
Pattern 5: One command, intelligent cadence
/start replaced 10 discrete Phase 5 routines. It runs phases 0 through 8:
initialize, scout (8 parallel subagents), housekeeping, outstanding-state load, brief,
interview prompt, interview (5 questions per engaged brand), ticket drafting,
one-at-a-time approval, dispatch, distillation, session close.
But it's intelligent: each phase is skippable based on freshness signals from
scripts/session_state.py:should_run(phase). Run /start 10 times
in a day — it skips re-scouting if no new commits in any of the 9 tracked repos. It skips
distillation if it already ran today. It always runs dispatch (the ticket queue must drain).
The result: same command whether you're checking in after 20 minutes or returning after a week. The system decides what's stale and what needs doing.
This addresses a real failure mode in multi-agent systems: the "cold start" problem. If re-running costs as much as starting fresh, operators either run it less (slower feedback loops) or run it more (wasted compute). Intelligent cadence makes re-runs cheap, so the system is always current without burning budget on redundant work.
Putting it together
These five patterns aren't independent — they're mutually reinforcing:
- Hooks let agents act with autonomy because you know the invariants hold. Without hooks, every action needs manual review.
- Agent contracts make authority chains enforceable. Hooks protect the state machine; contracts protect the knowledge base.
- Hybrid orchestration gives the main session the full toolkit it needs to coordinate everything else. Get this wrong and you can't run parallel scouts or interview the operator.
- Distillation is what makes the system compound. Without it, you start every session with the same knowledge you started with on day one.
- Intelligent cadence is what makes the system sustainable. Without it, you're either under-running (feedback loops are slow) or over-running (budget burns on redundant phases).
The architecture is not finished — it never will be. But the patterns above are load-bearing. They've survived months of daily production use, accumulated over hundreds of merged tickets and multiple phase migrations.