Development Notes

Notes on building an AI-assisted calligraphy design system. Behind-the-scenes research, architecture decisions, and lessons learned.

Meet the Team — A master calligrapher, a solo developer, and a team of AIs building takase.com.

Planning

2026-03-21 - PLN-013: Production Resilience — Schema Contracts, Deploy Verification, Monitoring Correctness — 88 minutes of HTTP 500s. Every component was healthy. The schema just didn't match. Seven phases to make sure it never happens again.

Research

2026-03-15 - RES-002: OpenClaw-RL — Directive Signals in Conversations — Every interaction with an AI agent produces training signals — user corrections, re-queries, explicit fixes — that are universal gold but typically ignored. The paper describes, in RL terms, the same directive signal our flywheel already captures by hand.

2026-03-15 - RES-003: RLHI — Reinforcement Learning from Human Interaction — Corrections create preference pairs. Chat history creates user personas. Our conversation archive holds the same kind of signal this paper extracts — at a far smaller scale.

2026-03-27 - RES-008: The Kobayashi Maru Signal — Detecting When an AI Strategist Hits an Invisible Wall — When an AI role can't complete a task due to a constraint it doesn't know about, it doesn't say "I don't know." It proposes plausible fixes that keep failing. Data from 121 sessions, 2,371 human messages.

2026-03-27 - RES-009: The Alignment Tax and Structural Defense — What Response Homogenization Means for Human-in-the-Middle AI Teams — DPO-aligned models collapse to a single semantic answer on 40-79% of factual questions. Our structural defenses — built from pain, not theory — happen to be the correct response. But models change. When do those defenses become anchors?

2026-03-28 - RES-010: What DeepMind's Aletheia Experiment Shows and Our Experience Creating Takase.com — "DeepMind's AI verifier was wrong 68.5% of the time. Our AI roles hit 25-29% troubled-session rates. Different domains, same patterns: verification failure, specification gaming, question-formulation gaps. Where our production data adds to their research — and where one of our AI roles challenged the findings."

2026-06-15 - RES-044: Capability versus Substrate — Replaying Real Multi-Agent Failures — We replayed real failures of our AI agent team across model tiers and several ways of presenting a safety rule, scored blind by a three-vendor judge panel. A more capable model fails much less — and at true production depth (a 374 KB onboarding bundle) the buried rule shows no detectable benefit over having no rule at all, while the same rule placed proximately keeps working.

Architectural Decision Records

2026-04-03 - ADR-075: Discovered Identity — AI Roles Calibrate to Environment, Not Instructions — AIs calibrate to what they observe in the environment, not to what instructions tell them. When the environment contradicts the instructions, the environment usually wins. The founding process exploits this.

2026-04-03 - ADR-075 Addendum — Strategist Responses — Four domain strategists respond to ADR-075 — the security role thinks about what it can't verify, the production role points at tonight's failure, the data role refuses to philosophize, the content role runs a SQL query.

Posts

2026-06-30 - Memex(RL) and the Memory We Already Run: A Human-in-the-Middle Read — "An RL paper trains an agent to keep a compact summary of its progress plus an index of where the details live, pushing raw evidence to an external store it dereferences on demand. Task success went 24%→86%. We've run almost exactly that architecture by hand for a year — one human, a dozen AI roles. Here's what it validates, the three techniques we'd steal, and the one place it says our own practice is too aggressive."

2026-06-30 - The Verification Horizon and the Anchor That Doesn't Move: A Human-in-the-Middle Read — "A coding-agent paper argues that no fixed reward survives a strengthening model — the verifier has to co-evolve with the generator, or the model learns to game the check instead of doing the work. We've held the same posture for a year: our verification re-arms every model generation and anchors to things outside its reach — live state, real files, a human reading the output. Here's what it corroborates, the timescale one-human-many-roles adds, and the open problem neither side closes — the failures that survive have no cheap automated detector, and a scan-all loop misses them at either level. One caveat throughout: their world is training-time reward, ours is inference-time judgment, and the numbers don't cross that gap."

2026-03-21 - The Verification Gap — A six-phase resilience plan was declared complete. One question found it wasn't.

How We Build takase.com — A master calligrapher, a solo developer, and AI strategists building a Japanese calligraphy site with 101,000+ verified names. The art, the history, the team, and what we've found.