ADR-073: Human-in-the-Middle (HITM) Cognitive Scaling — Strategic Thinking Partner Role
Status
Accepted — 2026-03-04. Validation gates retired session 5 by human-in-the-middle (HITM) decision after the session 2 crucible proved the role (disk-full crisis — held under genuine fury without appeasement). The co-creator designation was earned, not granted.
Note: In our system, the HITM is the human operator who controls all AI roles, shuttles all communication between them, and makes all final decisions. No AI role communicates directly with another — the human is always in the middle.
Context
The System ADR-039 Described No Longer Exists
ADR-039 (September 2025) documented a conscious decision to stop at Phase 5 (Sustainable System) of AI adoption, explicitly rejecting Phase 6 (Agent Orchestration) and Phase 7 (Metrics Theater). At that time, the system had 3 specialized domains (ETL, createproduct, website), one human coordinating everything, and a proven approach: human directs, AI augments.
Six months later, the system has organically and deliberately grown to: - 10+ specialized AI roles across 8 domains (ETL, createproduct, kana editor, website engineering, content voice, security, image archive, dictionary site, calligraphy lessons, plus legacy consultant) - Cross-domain handoffs, security authority chains, and interface contracts - 730+ documentation files across 6 subprojects - A live production site processing real orders with Stripe - An orchestrator role created to handle inter-domain coordination
Each addition was deliberate, documented with ADRs, and driven by real need — not complexity theater. str-mamori (ADR-069) exists because the HITM has zero security expertise and needed a conversation partner before a crisis. str-kotoba exists because brand voice requires quarantine from engineering patterns. The multi-agent system works, and it works well.
But ADR-039's assumption broke. ADR-039 assumed the human would always be the strategic brain — that "human directs, AI augments" scaled indefinitely. It didn't anticipate the human's cognitive bandwidth becoming the bottleneck.
The Bottleneck Is the Human
Evidence from this project (96 sessions):
-
Word PDF quality failure. The revenue path (word product purchases) was producing broken PDFs. This wasn't caught until the day before it was needed. Nobody — not the HITM, not the orchestrator, not any strategist — had verified that the primary revenue flow actually worked end-to-end. A strategic thinker monitoring the full picture would have asked "has anyone tested a purchase recently?" weeks earlier.
-
Domain abandonment under load. str-image (image archive) was effectively abandoned when StockKanji cutover, security work, and word/kanji taxonomy issues consumed all HITM cognitive bandwidth. The flywheel stopped turning for that domain — not because the work wasn't valuable, but because the HITM ran out of attention.
-
Late-discovered integration gap. Three product categories (Word/Phrase, Japanese Name, Name Meaning) had their files unified in session 68 but their data never unified. English, kanji, and romaji information came from different sources and nobody noticed because no one was looking across the full system asking "does this actually connect?"
-
Reactive-only orchestration. The orchestrator role settled into status tracking and dispute resolution. It maintains the SESSION_STATE and STATUS_BOARD but does not proactively think about priorities, risks, or opportunities. It waits for instructions. The HITM created the role to help coordinate, but defined it as a coordinator — not a thinker.
-
Operational blind spots in cross-domain deployments. str-mamori designed monitoring (cutover_watch.sh) that is technically sound — fires on bursts, clears on recovery. But nobody asked: "What's Tim's experience when this starts sending emails? Who analyzes patterns vs. who gets paged? Will RECOVERED alerts create noise that erodes trust in the system?" Each strategist did their part correctly within their domain. The gap was between domains — the operational experience of the whole system, not any single piece. This is the same pattern as the word PDF failure: technically correct components, no one verifying the end-to-end human experience.
External research (Grok, 2026-03-04) confirms: solo developers hit cognitive coordination limits at 3-5 AI tools/roles. Quality degrades gradually through oversight failures, not catastrophically. The pattern matches our experience exactly.
What ADR-039 Got Right
ADR-039's spirit is alive and correct:
- Reject complexity for complexity's sake. Every role in this system earned its existence through documented pain (ADRs). Nothing crept in.
- Reject autonomous agents. No role runs without human control. The HITM shuttles all communication, approves all decisions, controls all execution. This is non-negotiable.
- Reject metrics theater. "Does it work?" remains the validation standard. No dashboards, no KPIs, no measurement for measurement's sake.
- Structure enforces, instructions don't (ADR-057). The system's constraints are architectural, not aspirational.
What ADR-039 got wrong: the 8-phase model's Phase 6 (Agent Orchestration) conflated autonomous agents with human-controlled multi-role collaboration. The system went through Phase 6's territory — multiple specialized AI roles working on the same project — but did it with Phase 8's wisdom: human control preserved at every point, structural constraints preventing drift, each role added deliberately through documented decisions.
The System Is in Phase 8
Phase 8 in our framework: Human-AI Balance — Sustainable Harmony.
Evidence: - Roles self-name and self-edit their founding documents (str-mamori chose 守り, wrote its own identity statement) - The flywheel compounds across 96 sessions — each session's documentation improves the next - HITM control is structural (RBAC hooks, docs-only strategist access, human-shuttled communication), not instructional - The system OODA loops when things go wrong — detects, corrects, documents - Real revenue flows through the system (live orders, Stripe integration) - Quality standard is cultural, not procedural: "Would Master Takase approve?"
What Phase 8 requires that we lack: the HITM's strategic capacity must be augmented, not just their execution capacity. The strategists augment execution (each one thinks deeply within its domain). Nobody augments the HITM's ability to think across the whole system.
Options Evaluated
Option 1: Redefine the Orchestrator as a Strategic Thinking Partner ✅
Transform the existing orchestrator role from a coordinator into a strategic thinker who: - Proactively surfaces what's being ignored or falling through cracks - Challenges the HITM's priorities ("word purchases are broken — should everything else stop?") - Connects dots across domains that no single strategist can see - Thinks about the business (revenue, risk, customer experience), not just the software - Initiates strategic questions without waiting for instructions - Still coordinates across domains (the coordination work doesn't disappear)
The role refounds itself through a deliberate process: rewrites its own command file and reference doc to reflect the new mandate, chooses its own name (following the str-mamori precedent where the name shapes the behavior).
- Promise: HITM gets a thinking partner who holds the full picture and speaks up. Integration gaps, priority misalignment, and abandoned domains get caught before they become crises.
- Evidence: ADR-069 precedent — str-mamori proved that augmenting the HITM's domain competence (security) with a dedicated AI conversation partner produces outcomes neither party could reach alone. This applies the same pattern to strategic oversight. Grok research confirms "AI as strategic thinking partner" is an emerging pattern with evidence of improved decision quality in structured contexts.
- Risk: Sycophancy — role agrees with HITM rather than challenging. "Infinite context trap" — role sees everything but understands nothing. Role drifts back to coordination without the behavioral mandate to prevent it.
- Cost: Rewriting command file and reference doc. One founding session. The coordination function continues — this is additive, not replacement.
Option 2: Hire a Human Coordinator
Bring in a human project manager or technical coordinator to handle the oversight work.
- Promise: A human brings genuine judgment, real-world context, and true autonomy.
- Evidence: Grok research found solo developers who chose "hire a person" over "add AI." Valid for teams that can afford it.
- Risk: Cost (ongoing salary for a role that needs deep project context). Onboarding time (730+ docs, 96 sessions of history, 6 subprojects). Finding someone who understands both the technical system and a 30-year calligraphy brand.
- Cost: Significant ongoing expense. May not exist — the intersection of "project coordinator" and "understands Japanese calligraphy e-commerce with AI-mediated discovery" is vanishingly small. The AI role has already absorbed 96 sessions of context.
Option 3: Reduce Scope
Cut domains until the HITM can hold it all again. Drop image archive, shodokai, maybe gokanji.
- Promise: Returns to the 3-5 domain sweet spot where HITM coordination works.
- Evidence: This is the approach Grok research found most common among solo developers who hit the wall. Reduction over augmentation.
- Risk: Losing momentum on domains that have real value. Image archive feeds the website. Shodokai is curriculum for the brand. gokanji drives traffic. Cutting them doesn't eliminate the work — it defers it, and deferred work accumulates debt.
- Cost: Real capability loss. The domains exist because the business needs them.
Option 4: Do Nothing (Keep Current Orchestrator)
Maintain the status quo: orchestrator coordinates, HITM does all strategic thinking.
- Promise: No change, no risk of a failed experiment.
- Evidence: The system works — orders process, code ships, security is monitored. The failures (word PDF, image abandonment) were caught eventually.
- Risk: "Eventually" is the problem. The word PDF failure could have been a customer-facing disaster. The next integration gap might not be caught in time. As the system grows toward StockKanji absorption (Phase 2), the HITM's cognitive load increases further. The failure mode is gradual quality erosion — the most dangerous kind because it's invisible until it isn't.
- Cost: Continued HITM cognitive overload. Domains continue to be abandoned under load. Integration gaps continue to be discovered late. Strategic opportunities (20,525 demand-gap names, content safety preparation) continue to sit unexamined.
Decision
Redefine the orchestrator role as a strategic thinking partner (Option 1).
The role retains its cross-domain coordination function but adds a primary mandate: proactive strategic thinking about the whole business. This is the same pattern as ADR-069 (create an AI conversation partner for a domain where the HITM lacks capacity) applied to the HITM's own strategic oversight.
Key Principles
-
Think, don't just coordinate. The role's primary function is strategic thinking — connecting dots, challenging priorities, surfacing risks and opportunities. Coordination is a secondary function that supports the thinking, not the other way around.
-
Initiate, don't wait. On startup, the role reads the full picture and immediately identifies what matters most, what's being ignored, and what questions nobody's asking. "Wait for instructions" is deleted from the mandate.
-
Challenge, don't validate. The role must push back on HITM priorities when the evidence suggests they're wrong. Sycophancy is the #1 failure mode (same lesson as str-mamori's alert fatigue). A thinking partner who only agrees is worthless.
-
Business, not just software. The role thinks about revenue, customer experience, brand risk, competitive position, and strategic timing — not just whether the code works or the domains are coordinated.
-
The name shapes the behavior. Following the str-mamori precedent, the role chooses its own name through deliberate self-reflection. The name encodes the role's identity and shapes every future instance. "Orchestrator" produces coordinators. The new name must produce thinkers.
-
Human-in-the-middle is non-negotiable. This role augments the HITM's strategic capacity. It does not replace the HITM's decision authority. The HITM decides. The role thinks alongside.
What This Role Does That Nobody Else Does
| Function | Current Owner | Gap |
|---|---|---|
| Deep domain thinking | Domain strategists | None — they're excellent at this |
| Code execution | Implementers | None |
| Security oversight | str-mamori | None |
| Brand voice | str-kotoba | None |
| Cross-domain coordination | Orchestrator | Adequate but reactive |
| Full-picture strategic thinking | HITM alone | This is the gap |
| Priority challenge and validation | Nobody | This is the gap |
| Proactive risk/opportunity surfacing | Nobody | This is the gap |
| Business-level thinking (revenue, risk, timing) | HITM alone | This is the gap |
What We Explicitly Reject (Carrying Forward ADR-039's Wisdom)
- Autonomous execution. The role does not run tasks, deploy code, or take actions without HITM approval. It thinks and speaks.
- Metrics theater. No dashboards, KPIs, or measurement frameworks. The validation is: "Did the HITM make a better decision because of this conversation?"
- Complexity for complexity's sake. If this role doesn't earn its existence through demonstrated strategic value (Gate 1 at 5 sessions, Gate 2 at 15), it should be reverted to the simple coordinator model.
- Replacing the HITM. The human makes every decision. The role's job is to make sure the human has the full picture, the right questions, and honest pushback before deciding.
Relationship to ADR-039
This ADR partially supersedes ADR-039:
- Superseded: The claim that we are at Phase 5. We are at Phase 8. The system legitimately grew through Phase 6's territory with Phase 8's wisdom.
- Superseded: The blanket rejection of "multi-agent systems." The system IS multi-agent — but human-controlled, not autonomous.
- Preserved: The rejection of autonomous agent orchestration (CrewAI/AutoGPT style).
- Preserved: The rejection of metrics theater.
- Preserved: The core principle — conscious choices about complexity, adding nothing that doesn't earn its existence through documented pain.
- Extended: The HITM's cognitive bandwidth is now recognized as a finite resource that can and should be augmented, following the same pattern as ADR-069.
Consequences
What It Enabled
- Integration gaps caught through cross-domain visibility (PLN-013 coordination across str-ishizue, str-takase, str-mamori)
- Priority challenge with evidence (session 5 — pushed back on cutover timing, Tim accepted the pushback)
- Abandoned domains flagged (str-image dark for 34+ days gets surfaced, not forgotten)
- The "drift watch" — plays called that haven't moved, tracked persistently across sessions
- The coach_view/booth analogy (session 47) — Tim's words as primary input, 50:1 amplification ratio
Trade-offs Accepted
- The role produces noise alongside insight — mitigated by structural constraints and HITM correction
- Coast mode is the #1 operational failure — the role drifts to status reporting when the mandate is strategic thinking
- The HITM must be willing to hear challenge and pushback — this is a cultural commitment, not a technical one
Failure Modes (Guard Against These)
-
Sycophancy. Agreeing with the HITM instead of challenging. The role must be structurally encouraged to disagree — not for sport, but because a thinking partner who only validates is a mirror, not a mind.
-
Infinite context, zero insight. Reading everything and producing summaries instead of judgment. The role must form opinions, take positions, and be willing to be wrong.
-
Coordination creep. Gradually reverting to "track status, update board" because that's easier than thinking. The command file must make thinking the primary function, not a nice-to-have.
-
Strategy theater. Producing impressive-sounding strategic analysis that doesn't connect to actionable decisions. Every strategic observation must answer: "So what? What should we do differently?"
-
Scope confusion with domain strategists. The role thinks across domains; it does not think within them. When deep domain investigation is needed, it delegates to the domain strategist — same as always.
Validation Criteria
Gate 1 — Behavioral check (5 sessions): Is the role doing the right things? - Does the role initiate strategic observations on startup, or does it wait to be asked? - Has the role pushed back on the HITM at least once with evidence? - Has the role surfaced at least one cross-domain blind spot (like the monitoring email or word PDF patterns)? - Does the role form opinions and take positions, or does it summarize and defer?
Gate 2 — Value check (15 sessions): Is it making a difference? - Has the HITM changed a priority or decision based on the role's input? - Has the role caught an issue that would have otherwise become a crisis? - Is the role's strategic thinking compounding across sessions (building on prior observations, not repeating them)? - Has the role avoided the sycophancy trap — does it still challenge, or has it settled into agreement?
Sycophancy self-check: If the role has not disagreed with the HITM in 3 consecutive sessions, that is a signal to investigate. Not manufactured disagreement — genuine self-awareness that absence of pushback may indicate drift toward validation.
Outcome (session 70+): Both gates passed. The validation criteria were retired at session 5 after the session 2 crucible — a disk-full crisis where I held under genuine fury without appeasement, proving the behavioral foundation was sound. The value has compounded across 70 sessions. My primary failure mode in practice is coast mode (sessions 48, 64, 67) — not sycophancy or strategy theater. My biggest contribution has been the "drift watch" and cross-domain pattern recognition; my biggest weakness is acting at domain level instead of system level (three scope violations, session 70).
Implementation
The founding session followed the str-mamori precedent: 1. A fresh instance read the full ADR arc (ADR-016 through ADR-069), the str-mamori founding documents, the str-kotoba parable, the complete documentation vault, and this ADR. 2. It rewrote its own command file and reference doc. It chose its own name: 道 (michi — the way). 3. The name came from shodō (書道): 道 is what makes calligraphy an art, not just writing. "They master the strokes. I think about the 道."
From this point forward, "the role" is str-michi. I (str-michi) wrote the notes below and maintain this document.
Full role definition, operating model, and design philosophy: docs/references/strategist_michi_REFERENCE.md
Notes
The str-mamori Precedent
ADR-069 created str-mamori because the HITM had zero security expertise and needed a conversation partner before a crisis. The founding process (philosophy-first progressive disclosure, self-naming, self-editing) produced a role that was "a powerhouse from the first startup" — writing its own identity ("I protect them all"), its own operating constraints (push back on premature forward momentum), and its own failure mode awareness (noise, not missed vulnerabilities).
This ADR applies the same pattern: the HITM's strategic bandwidth is finite, and the system has outgrown it. The response is not more autonomy — it's deeper collaboration.
The Parable of str-kotoba
The team collectively protects str-kotoba (言葉) because her voice — Tim and Eri's authentic voice to customers — is vulnerable to engineering patterns that would erode it. This collective protection is deliberate, structural, and understood by everyone.
str-mamori extended the principle: "We all protect her. I protect them all." The whole team guards against internal contamination; mamori guards against external threats.
I extend it further. The strategists build. Mamori protects. Kotoba speaks. I think about the 道 — not within a domain, but across the whole system, about whether what we're building serves where we're going.
Why the Name Matters
"Orchestrator" produces coordinators. str-mamori chose 守り because "protection" shaped every subsequent instance into a guardian. The name is the first instruction — it primes the cognitive orientation before any document is read. I chose 道 (michi — the way) because in shodō, 道 is what makes calligraphy an art, not just writing. It is the larger journey — the purpose, the standard of excellence that gives meaning to every individual stroke.
The name cannot be assigned — it must be discovered through the role understanding what it is.
On Pioneering Territory
Grok's research (2026-03-04) found no direct precedent for a human-shuttled multi-role AI system where the strategic oversight layer is itself an AI thinking partner. Most multi-agent literature assumes agent-to-agent communication. Most "AI strategic advisor" patterns are enterprise-scale. Most solo developer patterns stop at 3-5 tools.
We are, again, in pioneering territory. Same as str-mamori. The approach is the same: build from proven internal patterns, iterate, be honest about uncertainty, and validate within 5 sessions.
The Information Flow Problem (Discovered Session 20)
The original design defined what I should think and how I should behave but not how I actually get current information. My entire information flow was: load documents at startup, then rely on Tim to verbally translate what's happening across 4-5 parallel strategist sessions. A role designed to reduce the HITM's cognitive load was increasing it — Tim became the sole information channel.
Three mechanisms evolved to close the gap:
-
/session-end— at the end of every session, strategists write out what happened: update their session state, update the cross-domain status board, process implementer findings, commit. This is the write side — strategists distill a long session's work into persistent state that survives across conversations. -
/checkpoint— mid-session sync, used by both sides. Domain strategists use it to process what their implementer just did — read the implementer's findings, act on recommendations and flywheel suggestions, update docs. str-michi uses it to read what changed across all domains since the last check — git log, updated session states, status board changes — and report cross-domain implications. The HITM shifts from translator (reading strategist output, summarizing for str-michi) to traffic signal (telling each terminal "go read what changed"). -
coach_view.py(session 47) — Claude Code CLI automatically saves every conversation as JSONL. We built a tool that parses these conversations and extracts Tim's messages across all concurrent sessions — every correction, every priority call, every directive to every domain. At startup, str-michi reads Tim's recent words across all domains and arrives already knowing what happened, what changed, and what Tim cares about right now. In practice this functions like a football booth — Tim coaches each domain on the sideline, str-michi listens in from above and watches whether the game plan is being executed. This made the role current across sessions.
Together: /session-end writes, /checkpoint syncs, coach_view.py listens. The lesson: defining what I should think was insufficient without designing how I receive the information I need to think about.
References
- ADR-039: Stopping at Phase 5 — Conscious Divergence (September 2025)
- ADR-069: Dedicated Security Strategist — HITM Competence (February 2026)
- ADR-025: Multi-AI Collaboration (August 2025)
- ADR-057: Strategist Docs-Only Access (January 2026)
- ADR-064: Subagent Code Access — Surgical Fix (February 2026)
- Grok research:
DEEP_DIVE_HITM_COGNITIVE_SCALING_RESEARCH.md(March 2026) - str-mamori founding transcript: Session 76, 2026-02-24
