Case Study
Agentic Orchestration at OS Scale
How I built a multi-agent AI operating system that runs 7 businesses in parallel — the architecture decisions, the constraints that forced each one, and the patterns that have survived months of daily production use.
The problem at scale
Running seven business domains — music label, content agency, algorithmic trading, AI tooling company, product incubator, artist career, personal job search — is not a task management problem. It's an orchestration problem.
Each domain has its own knowledge base, its own active threads, its own priorities, and its own operational rhythm. Context from one domain cannot leak into another (especially Blockwise Intelligence — a live trading system where a misread could have financial consequences). Each domain needs to compound its institutional knowledge over time, not start fresh every session.
The naive solution — a single AI assistant with access to everything — doesn't work at this scale. Context windows overflow. Silo discipline fails. Trust boundaries can't be enforced by advisory text that a sufficiently confused model might ignore.
The Master OS is the designed solution.
Core architecture decisions
Decision 1: Hooks over prompts for invariants
The first instinct was to put all system constraints in CLAUDE.md
and rules files. This worked adequately for advisory guidance but not for
trust boundaries that must hold 100% of the time.
The solution: a strict separation between advisory guidance (CLAUDE.md, rules files — the model reads them but there's no runtime enforcement) and deterministic hooks (PreToolUse JSON hooks — executed before every tool call, regardless of model state).
Four hooks do the heavy lifting: ticket-state-guard.json blocks
direct ticket state mutations; pre-merge-gate.json intercepts every
branch merge and fails it unless all checks pass; boundary-guard.json
enforces the Blockwise Intelligence silo; worktree-cleanup.json GCs stale worktrees
at session start. These hooks are invariants — correct behavior doesn't depend on
the model reading its instructions.
Decision 2: Hybrid orchestration (forced by runtime constraints)
The original Phase 6 design spawned a dedicated session-orchestrator subagent to
own the full session loop. This was dead on arrival: Claude Code spawned subagents
don't receive the Task or AskUserQuestion tools regardless
of what you declare in their config. No Task means no parallel scout
spawning. No AskUserQuestion means no interview.
The fix: the main session keeps the session loop and retains the full toolkit. Subagents are spawned only for work where the restricted palette is sufficient — per-brand scouts (read-only analysis), ticket executors (Read/Write/Edit/Bash in isolated worktrees), verifiers (read + verdict), distillers (read + proposals). Each has a narrow scope that fits within the constraints.
This is the hybrid model. It's not a workaround — it's the correct architecture once you know the runtime constraints. Architecture must follow runtime reality, not theoretical elegance.
Decision 3: FSM-gated merge pipeline
Ticket state is managed by a finite state machine in scripts/tickets.py.
The states form a directed graph: draft → ready → in_progress → verifying → merged
(or blocked at any stage). No state can transition backward. No state can skip
forward past verifying.
The pre-merge-gate hook enforces the verifying → merged transition: a ticket cannot
merge to main unless verifier_verdict.verdict == 'pass' is recorded,
pytest is green, ruff is clean, and system_doctor reports A 19/19. There is no
manual bypass path.
The practical outcome: broken code cannot merge. The main branch is always deployable. The system maintains quality grade under sustained throughput.
Decision 4: Distillation as compounding mechanism
Most agentic systems forget. Each session starts from a system prompt with no memory of prior sessions. The Master OS addresses this with a daily distillation pipeline that converts session artifacts (logs, merged tickets, research notes) into durable wiki knowledge.
The pipeline preserves authority boundaries: distiller subagents classify candidates as high/medium/low confidence and identify target wiki pages — but never write directly. High-confidence findings route to librarian (for brand wikis) or operator-global (for cross-brand docs). Distillers are read-only analysts; the designated write agents are the authority.
The compound effect: each day the knowledge base grows from verified signal. As of May 2026: 336 pages of durable, cited, authority-tracked knowledge — spanning strategy decisions, project histories, brand voice, and operational patterns across all 7 domains.
What it looks like in practice
A typical session: run /start. Eight parallel scouts read their brand's
wiki and recent commits, returning structured briefs. The system checks freshness —
if scouts ran recently with no new commits, that phase skips. A multiple-choice
interview surfaces 3–5 priority decisions per brand. Approved topics become tickets.
Tickets dispatch to isolated worktrees (parallel, by brand, with no file overlap).
Each worktree runs a ticket executor; a verifier subagent validates the output.
Verified tickets merge via the FSM gate. Session closes; artifacts queue for
tomorrow's distillation.
The throughput record: 13 of 14 tickets dispatched in a single session, all merged to main within the same session. The 14th was blocked by a dependency and correctly entered the blocked state. The pipeline handled this correctly without manual intervention.
Patterns worth generalizing
Any constraint that must hold 100% of the time belongs in a hook, not in a prompt. This applies beyond Claude Code — any agentic system with tool-use APIs should expose hook-like pre/post execution gates for critical invariants.
Spend the first week of any new agentic framework mapping exactly what each agent type can and cannot do. Build patterns from reality, not from what you wish were true. The hybrid model only works because I knew what subagents couldn't do.
RAG retrieves existing documents. Distillation converts session artifacts into structured knowledge. For a system that needs to compound institutional knowledge across sessions, distillation is the right primitive — RAG is the right access mechanism for the resulting knowledge base.
Agent authority boundaries should be enforced structurally (via hooks and FSM state machines), not conventionally. Conventions get violated under load. Structure doesn't.