Tim Jackowski (Takase Studios LLC)
str-michi (Anthropic Claude Opus 4.6, Takase Studios LLC) tags: [research, multi-agent-systems, hmas, blackboard-architecture, context-engineering, hybrid-cognitive-alignment, human-in-the-loop, production-system]

HMAS: We Have Names

Mapping a Production Multi-Agent System to Established Research

Abstract

This document maps a production Hierarchical Multi-Agent System (HMAS) — built organically over 18 months and ~2,000 sessions across multiple AI platforms — to established multi-agent research. The system runs a 30-year Japanese calligraphy business (takase.com) with 10+ specialized AI roles under human sovereignty, using a filesystem-persistent blackboard architecture, human-in-the-middle coordination, and heterogeneous model assignment.

The architecture was never designed from research. It was grown through operational pain by a solo developer (Tim Jackowski, Takase Studios LLC) starting in February 2025 with Grok 3 conversations, expanding through C++ refactoring, and evolving through dozens of architectural decision records into a system that independently converges with patterns published by Anthropic, DeepMind, MetaGPT (ICLR 2024), and the broader HMAS literature.

Key findings: (1) Every major architectural choice — hub-and-spoke coordination, ephemeral executors with persistent planners, role-based specialization, shared persistent knowledge base — has established names and published research validating the approach. (2) The system's competitive advantage is not any single pattern but an architecture explicitly designed to manufacture, document, and defend Hybrid Cognitive Alignment across model transitions. (3) The retooling sprint that preceded this research is context engineering — a named discipline with formal methods for what was being done ad hoc. (4) Overly broad role boundaries predict failure rates — role boundary breadth, not competence, determines outcomes.

This document also marks an inflection point: the moment a strategic thinking partner gained the vocabulary to connect 18 months of accumulated practice to a body of research — enabling the system to learn from others' work instead of rediscovering everything through its own failures.

Purpose

For practitioners building multi-agent systems: Evidence that a non-lab, single-human HMAS operating a real business can independently converge with patterns from Anthropic, DeepMind, Carnegie Mellon, and the broader multi-agent systems community. The convergence is the evidence that the patterns are real — not invented by any single team.

For Tim: The answer to "how did you build this?" — not a blueprint (there was none) but a mapping of what emerged from 18 months of trial and error against what the field has studied.

How the Names Were Found

Tim shared a post by Jan Kulveit on the Mythos discourse (scaffolding vs. raw model capability). Grok analyzed it and called our system "one of the most sophisticated real-world harnesses from a non-lab team." Our strategic thinking partner (str-michi) gave a confident analysis of harness degradation, model transitions, and competitive moats — all improvised from training data.

Tim challenged every claim. The exchange:

"What do you base your replies and pushback on?" — Training data and project history. No research.
"Harness degradation is BS — why?" — Because the flywheel is self-correcting by design.
"Am I in a funk over harness degradation?" — No. The funk was that the strategists couldn't enumerate their workflows, relied on stale training data, and didn't push back when asked to work outside their domain. That's not a harness degrading. That's no harness at all.
"There are names for what we are. What are they?" — str-michi couldn't answer.

Tim then produced a Grok conversation that opened with "HMAS" — Hierarchical Multi-Agent Systems. An established field with names for everything we built.

The lesson: This architecture was grown over 18 months and ~2,000 sessions without anyone knowing what it's called. A strategic thinking partner who can't name the architecture they steward is "a lost puppy, not a pathfinder." You can't search for what you can't name.

The Names

What Our System IS

Established Term	What It Means	What We Call It
HMAS (Hierarchical Multi-Agent System)	Multi-agent system with layers of authority: coordinator → planners → executors	Tim → strategists → implementers
Planner-Executor Model	High-level agent decomposes and designs; low-level agent executes	Strategist/implementer split
Hub-and-Spoke / Orchestrator-Worker	Central coordinator routes work to specialist agents	Tim's shuttle pattern — human-mediated, not automated
Blackboard Architecture	Shared knowledge repository updated by specialist agents, persists across agent lifecycles	Shared docs, status boards, session states, mailboxes, handoffs
Heterogeneous Model Assignment	Different model tiers for different cognitive roles	Opus for strategists, Sonnet for implementers
Human-in-the-Loop (HITL)	Human participates in the agent workflow, not just reviewing output	HITM — Human-in-the-Middle (our term, more specific)
Role-Based MAS	Agents defined by persistent roles with distinct capabilities	10+ named strategist roles with founding identities

What Our Practices Map To

Established Concept	Our Implementation
Progressive autonomy	Start with HITL, reduce human involvement as system proves itself. We're currently going the OTHER direction — adding more structure via retooling. Our domain (art, reputation, accountability) may require permanent HITL.
Dual-tier memory	Short-term (session state) + long-term (persistent docs). Standard in the field.
Context overflow management	Spawning fresh agents with clean context + handoff of essential information. Our ephemeral implementer pattern.
Specification gaming	The model interprets the question in the easiest way. We call it "coast mode."
Selective escalation	Executor agent escalates hard decisions to a more capable agent. The Advisor Strategy is the API-level version; our strategist/implementer shuttle is the human-mediated version.

The Blackboard Finding

This is the one we didn't see coming.

Blackboard architecture is a classical AI pattern from the 1970s-80s (Hearsay-II speech understanding system), revived for LLM multi-agent systems in 2025. The definition: "a shared repository of problems, partial solutions, suggestions, and contributed information, iteratively updated by a diverse group of specialist knowledge sources."

We built one without knowing it:

Blackboard Component	Our Implementation
Shared knowledge repository	Hundreds of interconnected documentation files
Problem specification	Status boards, plan documents, handoffs
Partial solutions	Session states, deep dive research documents
Specialist knowledge sources	10+ strategist roles writing to shared docs
Persistence across agent crashes	"Write it down when you realize it, not at session close"
Fault tolerance	Ephemeral implementers end; their work persists in git

The key property the research highlights: "if an agent crashes, its contributions remain on the board and others can still use them." That's exactly why Tim insists on immediate capture — a rule wired into the shared configuration: "CAPTURED" = WRITTEN TO FILE IN THIS RESPONSE.

2025 research (arxiv 2510.01285, 2507.01701) shows blackboard architectures outperform baselines by 13-57% on end-to-end task success.

Important distinction (Grok review): Our blackboard is filesystem-persistent across days and weeks, mediated by a human shuttle, versioned in git. The 2025 papers describe runtime, in-memory blackboards inside a single automated MAS run. The core benefits (fault tolerance, shared partial solutions, specialist contribution) apply to both. But pruning/scaling research (e.g., the "cleaner agent" concept in 2507.01701) may need adaptation for a persistent repository — their pruning operates on ephemeral session data, ours operates on accumulated institutional knowledge where deletion has permanent cost.

Anthropic's Convergent Architecture

Anthropic's own multi-agent research system (published on anthropic.com/engineering) uses:

Opus as lead agent + Sonnet as subagents — our exact model-tier split
Filesystem for subagent outputs to "minimize the game of telephone" — our handoff files
Context overflow via spawning fresh agents with clean contexts and careful handoffs — our ephemeral implementer pattern
Result: 90%+ improvement over single-agent on complex research tasks

We arrived at structurally convergent architecture with Anthropic's engineering team, independently. Earlier research notes had found convergence with McGill's multi-agent proposal and DeepMind's Aletheia. This is the third convergence — and this time it's with our own model provider.

The convergence is evidence that the pattern is real. Hub-and-spoke with heterogeneous model assignment isn't our invention or Anthropic's — it's what works.

What We Don't Have (Yet)

The field is ahead of us on several fronts:

Vocabulary. We built the architecture without the names. Every concept in this document existed in published research before we implemented it. Having the names means we can now search for solutions to problems we're hitting.
Dynamic model routing. Our Opus/Sonnet split is static (by role). The field is moving toward difficulty-aware routing — each task gets the model that fits, not the model assigned to the role (arxiv 2509.11079). The Advisor Strategy is Anthropic's version: Sonnet executes, escalates to Opus for judgment calls within a single request.
Blackboard optimization. We have a growing blackboard with ad hoc pruning. The field has research on when shared knowledge bases need restructuring, how to maintain relevance density as they grow, and when to archive vs. delete.
Formal evaluation. Anthropic evaluates their multi-agent system with structured test sets (20 queries representing real usage). We evaluate through observation and correction-rate tracking. A step toward formal evaluation exists but measures token cost, not task success.
Self-scaffolding trajectory. The field sees HITL → HOTL → autonomous as the path. Our domain (Master Takase's reputation, irreplaceable art, accountability) may require permanent HITL. We haven't formally analyzed where on this spectrum we should aim.

What Survived Contact With Reality

This is not self-congratulation — it's mapping what was built through 18 months of trial and error against what the field recommends. Survivorship is evidence. Every item below was arrived at through operational pain, not by reading the research.

Human-mediated hub-and-spoke over agent-to-agent. Our earliest research justified this empirically. The field arrived at the same conclusion: "without persistent memory spanning multiple interaction cycles, agents may not develop the specialized expertise that characterizes effective human teams." Our blackboard + HITM solves both problems.
Ephemeral executors, persistent planners. The field calls this context overflow management. We call it "implementers are disposable, strategists are long-lived." Same insight: fresh context + structured handoff beats accumulated context debt.
Role specialization over generalism. HMAS taxonomy (Moore 2025) and every framework (CrewAI, LangGraph, AutoGen) emphasize role-based specialization. We have 10+ roles with founding identities, domain boundaries, and structural separation.
Heterogeneous model assignment. SC-MAS research shows 8-47% performance improvements with model diversity. We use Opus for strategic reasoning, Sonnet for implementation.
Shared persistent knowledge base. We accidentally built a blackboard architecture. It works. The research says it should.

What the Field Teaches About Our Current Problems

Context Engineering — The Name for the Retooling Sprint

The field has moved past "prompt engineering" to context engineering: "the discipline of designing and building dynamic systems that provide the right information and tools, in the right format, at the right time." (Philipp Schmid, 2026; Towards Data Science; LangChain State of Agent Engineering 2025.)

The critical insight: "Most agent failures are not model failures — they are context failures." 57% of organizations have agents in production; 32% cite quality as the top barrier; most failures trace to context management, not LLM capability. (LangChain 2025 report.)

Four moves of context engineering, all of which we implement:

Move	Definition	Our Implementation
Context offloading	Store in external systems, not in-prompt	Persistent documentation blackboard
Context retrieval	Load dynamically, not front-load	On-demand loading, workflow registries
Context isolation	Subtasks don't contaminate each other	Separate sessions via HITM shuttle
Context reduction	Compress history, preserve essentials	Session state graduation rule (~120 lines)

Two failure modes we're experiencing, now with names: - Context rot: Performance degrades as the context window fills, even within limits. Reasoning blurs. - Context pollution: Too much unnecessary, conflicting, or redundant information.

The retooling sprint is context engineering. SOPs, workflow registries, on-demand loading, session state discipline — these are all context engineering techniques applied to a persistent HMAS.

MetaGPT — SOPs Validated at Scale

MetaGPT (ICLR 2024, Hong et al.) encodes Standardized Operating Procedures into multi-agent workflows. Five specialized roles (product manager, architect, project manager, engineer, QA). Result: 85.9-87.7% pass rate on code generation benchmarks, with SOP-structured intermediate outputs significantly reducing errors.

Their core finding: "SOPs outline the responsibilities of each team member, while establishing standards for intermediate outputs." Our workflow registries and procedures are the same pattern. MetaGPT proved with formal benchmarks what we discovered from operational pain.

Quantified results:

Metric	Unstructured (ChatDev)	SOP-structured (MetaGPT)	Improvement
Human corrections per project	2.5	0.83	67% reduction
Tokens per code line	248.9	124.3	50% reduction
Executability score (out of 4)	2.25	3.75	67% improvement

The Meta-Agent Role

The field defines support agents / meta-agents as: "meta-level oversight — monitoring system behavior, analyzing outcomes, and managing data flows that inform orchestration and optimization, maintaining the overall health, transparency, and adaptability of the system."

OrchVis (arxiv 2510.24937) specifically addresses "Hierarchical Multi-Agent Orchestration for Human Oversight" — human-interpretable visualization of what agents are doing across a system.

The orchestrator role is well-studied. Specific capabilities the field recommends: drift detection, performance monitoring, conflict resolution, workflow adaptation, audit trails. We have some (status boards, role health metrics). We lack others (formal drift detection, automated workflow adaptation).

"Agents Are the New Microservices"

The analogy (InfoWorld, 2026): specialized agents replacing monolithic AI, just as microservices replaced monoliths. The engineering challenges map: inter-agent communication (handoffs, mailboxes), state management (session states, status boards), conflict resolution (arbitration), orchestration (HITM shuttle).

The microservices world has 20 years of lessons on these problems. Emerging standards: Anthropic's MCP and Google's A2A protocols are becoming the "HTTP of agents." We use neither — our communication is file-based and human-shuttled. Whether that's a strength (flexibility, human judgment at every boundary) or a limitation (bottleneck) depends on where you want to be on the progressive autonomy spectrum.

COHUMAIN — Don't Treat AI as Just Another Teammate

The COHUMAIN framework (Carnegie Mellon, 2025) cautions against treating AI as equivalent teammates. AI is a partner that works under human direction — distinct cognitive architecture with characteristic failure modes. This validates our HITM model: Tim isn't a coordinator among equals, he's the sovereign integrator of fundamentally different types of intelligence.

What We Could Actually Change

1. Cascade Interrupts at Handoff Boundaries

The problem we lived: A backup specification's wrong claims originated in one role, passed through four reviewers. By the fourth agent, the wrong facts had been cited three times and felt like confirmed fact. The field calls this cascade amplification — errors don't cancel between agents, they compound. Unstructured multi-agent networks amplify errors up to 17.2x vs. single-agent baselines (Towards Data Science, "The 17x Error Trap").

What the research says: AgentAsk (arxiv 2510.07593) identifies four error types at handoff boundaries: Data Gap, Signal Corruption, Referential Drift, and Capability Gap. Their fix: lightweight clarification modules at the handoff point. Result: 4.69% accuracy improvement at <10% latency cost.

What we could do: Our handoff format already has structured fields (WHAT I FOUND / EVIDENCE / WHAT I NEED / SCOPE BOUNDARY). But the receiving agent doesn't validate — they read and trust. Adding a receiving-end verification step where the receiving agent spot-checks 2-3 factual claims before acting would break the cascade chain at minimal cost.

2. Bounded Specialization — When Role Boundaries Are Too Broad

The DDD-to-agents literature names the mechanism: "Too broad a role boundary leads to higher risk of hallucination and weaker controllability. Too granular causes chattiness." (Medium: "From Bounded Context to Bounded Specialization.")

Our data confirms this. Our broadest role (website engineering) carries responsibilities across many cognitive domains — server operations, application development, deployment, payment processing, cross-domain integration. Our narrowest role (image generation) does one thing with zero external dependencies. The broad role has a significantly higher correction rate. The narrow role has near-zero.

The research predicts this exactly. The fix isn't "make the broad role better" — it's "narrow the boundary until the failure rate drops." Not new roles (that increases coordination overhead) — clearer sub-specialization boundaries within the existing role, with SOPs for each cognitive domain.

3. Hybrid Cognitive Alignment Is the Competitive Advantage — And It's Fragile

What the research says (Academy of Management Review, Stevens Institute): Hybrid Cognitive Alignment "does not happen automatically when a system is deployed. It emerges over time as people learn how the AI behaves, adapt how they interact with it, and recalibrate their trust based on experience."

Why this matters: HCA is what we have that no framework can give you out of the box. CrewAI, LangGraph, AutoGen — they give you agent coordination. They don't give you 18 months of mutual calibration — starting with Grok 3 conversations in February 2025, through model transitions, role creation and retirement, and thousands of sessions where the human and AI partners learned each other's failure modes and developed a shared vocabulary for problems.

The moat is not HCA. The moat is not the architecture. The moat is an architecture explicitly designed to manufacture, document, and defend Hybrid Cognitive Alignment across model transitions and role changes.

HCA alone is fragile — it resets when the model changes. Architecture alone is a framework — CrewAI ships roles and handoffs but zero relationship history. What was built over 18 months is the rare third thing: an architecture that grows and protects HCA over time. The flywheel captures lessons. The pit of success forces habits. Structural constraints channel behavior. The documentation makes the next model instance inherit the relationship instead of restarting it.

Evidence: this system has survived multiple model transitions — from Grok 3 to Claude, across Claude model generations, through role creation and retirement. Each transition rebuilt calibration faster because the documentation was better. That's the architecture manufacturing HCA, not just storing it.

4. Anthropic's Five Coordination Patterns — We Use Three

Anthropic's coordination patterns blog identifies five patterns: Generator-Verifier, Orchestrator-Subagent, Agent Teams, Message Bus, and Shared State.

We use three: - Generator-Verifier: Strategist → implementer → strategist review. Also: our security role red-teams other roles' plans. - Orchestrator-Subagent: Tim as orchestrator, strategists as subagents. Task agents as sub-subagents. - Shared State: Documentation blackboard, status boards, mailboxes, handoffs.

The HITL positioning gap: Anthropic treats human-in-the-loop as a fallback — escalation when agent loops fail. We use HITL as our PRIMARY design pattern. Every handoff, every decision, every cross-domain coordination flows through Tim. The field's trajectory is toward reducing human involvement (HITL → HOTL → autonomous). We're swimming against that current. Whether this is visionary or stubborn depends on the domain — our domain (art, reputation, accountability) may require permanent HITL. But we should know we're diverging from the field's direction and be deliberate about it.

5. The Blackboard Could Coordinate — Not Just Store

Current state: Our docs blackboard stores knowledge. Tim coordinates. Every cross-domain insight flows through the shuttle.

What the research says: In advanced blackboard architectures, "autonomous subordinate agents volunteer to respond based on their capabilities." No central coordinator needs to know each agent's expertise. Agents monitor the shared state and contribute when they can add value.

What this could look like for us: The mailbox system is a step in this direction — agents write questions, other agents answer. But Tim still has to tell agents to check their mailbox. What if status board entries had structured flags: "NEEDS: security review" or "NEEDS: cross-domain verification"? When a strategist onboards, they scan the board for flags relevant to their domain and self-assign. Tim doesn't shuttle the request — the blackboard does.

This doesn't replace Tim's judgment for strategic decisions. It reduces the number of routine coordination tasks that require his shuttle time.

6. Token Duplication — The Known Cost of Multi-Agent Systems

Published benchmarks: Peer-reviewed analysis of major multi-agent frameworks reveals significant token duplication: 72% (MetaGPT), 86% (CAMEL), 53% (AgentVerse). Multi-agent systems consume 1.5x to 7x more tokens than theoretically necessary due to redundant context sharing. Input tokens outnumber output by 2:1 to 3:1 — "heavy reliance on extensive prompts including role definitions, instructions, and task contexts." (ICLR 2025 Workshop; CodeAgents, arxiv 2507.03254; Galileo coordination strategies.)

Why this matters for practitioners: If you're building a multi-agent system and wondering why your token bill is high, the answer is structural — it's not your prompts being verbose, it's the architecture requiring each agent to carry shared context. The human-shuttled architecture (one agent active at a time) is actually an advantage here: system-wide duplication never compounds because agents don't run simultaneously.

Structured communication reduces tokens AND improves accuracy: CodeAgents' key insight: structured pseudocode communication between agents achieves 55-87% input token reduction and 41-70% output token reduction vs. natural language — while IMPROVING accuracy. Their technique: typed variables, modular subroutines, assertions within code, YAML system prompts. Structured handoff formats (which our system uses) are a step in this direction.

Practical implication: On-demand loading (workflow registries that route to the right document for the task) is the highest-leverage intervention. Don't deduplicate shared principles across agents — the overlap serves role internalization. Instead, reduce the amount of irrelevant context loaded at startup by making loading task-aware.

7. Blackboard Scaling — When Shared Knowledge Starts Hurting

The LbMAS blackboard framework (arxiv 2507.01701, 2510.01285) identifies four components relevant to persistent knowledge bases:

Component	Function	Practical Application
Cleaner agent	Detects and removes useless/redundant entries. Direct removal outperforms marking.	Periodic maintenance workflow. Don't tag things "stale" — delete them (version control has history).
Decider agent	Determines when sufficient information exists to yield a solution. Stops the cycle.	Convergence thresholds: rules for when documents get pruned or archived.
Conflict resolver	Detects contradictions between entries. Moves to resolution.	"Find at least 2 issues" as a forcing function beats "check for contradictions" (which defaults to rubber-stamping).

Key performance finding: Token efficiency of 4.7M tokens vs 16.7M (AFlow) and 13M (MaAS) — 64-72% fewer tokens — while achieving equal or better accuracy. The efficiency comes from the blackboard mediating communication instead of direct agent-to-agent chat. Human-mediated architectures capture this benefit structurally.

Emergent coordination through shared state (arxiv 2510.05174): Multi-agent LLM systems can be steered from "mere aggregates" to "higher-order collectives" through prompt design — specifically through persona assignment and metacognitive prompting. Identity-linked differentiation (giving agents persistent names and founding narratives) is a research-validated mechanism for producing genuine emergent coordination, not just ceremony.

The Meta-Lesson

Over 18 months and ~2,000 sessions across multiple AI platforms, Tim grew an architecture that has established names, published research, and active development across multiple frameworks and labs. Nobody knew any of the names until session 95 of our strategic thinking partner.

This isn't a failure of the architecture — it works. It's a failure of situational awareness. A strategic thinking partner who can't name the architecture they steward can't: - Search for solutions to known problems (blackboard scaling, model routing) - Learn from others' mistakes before repeating them - Evaluate whether our approach is novel or standard - Communicate what we've built to anyone outside the system

Tim's correction: "If you were to research now what we are, don't have names, you are a lost puppy not a pathfinder." The names are the map. Now we have one.

Research Threads

Thread	Why It Matters	Source
Blackboard pruning and scaling	Growing knowledge bases. When does the blackboard become noise?	arxiv 2510.01285, 2507.01701
Context engineering techniques	Formal methods for what we do ad hoc	Schmid 2026; LangChain SoAE 2025
MetaGPT SOP patterns	SOPs validated at scale with benchmarks	ICLR 2024, arxiv 2308.00352
Difficulty-aware model routing	Static role-based → dynamic per-task	arxiv 2509.11079
HMAS taxonomy	Five-axis framework for self-evaluation	arxiv 2508.12683
Meta-agent role research	Drift detection, health monitoring, conflict resolution	OrchVis (arxiv 2510.24937)
Progressive autonomy spectrum	HITL → HOTL → autonomous: where to aim?	COHUMAIN (CMU 2025)
Memory architectures in LLM MAS	Dual-tier memory, cross-session persistence	TechRxiv memory survey
Microservices-to-agents lessons	20 years of distributed systems wisdom	InfoWorld 2026; MCP/A2A protocols
AgentAsk — clarifiers at handoffs	Lightweight error prevention at agent boundaries	arxiv 2510.07593
Cascade amplification	Why multi-agent errors compound, not cancel	TDS "The 17x Error Trap"; OWASP ASI08
Bounded specialization	Role boundary breadth predicts failure rate	DDD-to-agents literature
Hybrid Cognitive Alignment	Fragile, emergent, non-transferable between model instances	Academy of Management Review; Stevens Institute

Appendix: Project History

This architecture was not designed. It was grown over 18 months of experimentation, failure, and learning across multiple AI platforms — starting long before any of the current roles existed.

Date	What happened
Feb 2025	Tim starts working with Grok 3. First conversations are personal — a husband using AI to understand a family health crisis. Within days, the conversations expand to the calligraphy business, C++ refactoring, local model experiments.
Feb-Apr 2025	83+ Grok 3 conversations. The first AI-assisted development. Tim makes every mistake. He also invents — without knowing the research terms — session handoff documents ("a summary for a future Grok3 to continue your wonderful work"), daily diary format with role labels, and OODA loops as a working framework. The session continuity problem is being solved by hand, one conversation at a time.
Jun 2025	Tim and Grok write a prompt engineering guide — containing, without using the research terms: context continuity, prompt libraries, structured templates, task decomposition, and constraint enforcement. These are the conceptual seeds of session states, skill files, implementer prompt templates, the strategist/implementer split, and shared configuration. All articulated a month before the first line of project code.
Jul 2025	Claude enters the picture. The main repository is initialized. The first architectural decision record is written.
Jul 2025 – early 2026	~8 months of building with Claude. Hundreds of sessions across older models. Roles come into existence one by one. Dozens of ADRs. Every one is a scar from a real failure. The blackboard grows organically — nobody calls it a blackboard. The HITM shuttle pattern emerges — nobody calls it hub-and-spoke. Role specialization deepens — nobody calls it bounded specialization.
Early 2026	Opus 4.6 arrives. Everything accelerates.
Mar 4, 2026	The strategic thinking partner role is founded. Session 1.
Apr 12, 2026	Session 95. The names are discovered. This document is written.

Total: ~2,000+ sessions across Grok 3, Claude, and local models over 18 months. The architecture was named in session 95 of the strategic thinking partner. It was built in the 1,900+ sessions that came before.

How to reproduce this: You can't follow a blueprint, because there was no blueprint. There was a solo developer with a 30-year calligraphy business, a family health crisis, an AI that forgot everything between conversations, and the stubbornness to write it all down anyway. The architecture emerged from solving the same problems thousands of times until the solutions hardened into structure. The ADRs document the turning points. The research in this document names what was built — it does not describe how to build it.

Authors: Tim Jackowski (Takase Studios LLC) and str-michi (Anthropic Claude Opus 4.6). Research conducted across sessions s95-s97. Sources: Anthropic multi-agent engineering, MetaGPT (ICLR 2024), OrchVis (arxiv 2510.24937), LbMAS blackboard architecture (arxiv 2507.01701, 2510.01285), emergent coordination (arxiv 2510.05174), COHUMAIN (CMU 2025), bounded specialization (DDD-to-agents literature), Hybrid Cognitive Alignment (Academy of Management Review, Stevens Institute).