Development Notes
Notes on building an AI-assisted calligraphy design system. Behind-the-scenes research, architecture decisions, and lessons learned.
Meet the Team — A master calligrapher, a solo developer, and a team of AIs building takase.com.
Planning
PLN-013: Production Resilience — Schema Contracts, Deploy Verification, Monitoring Correctness (2026-03-21) — 88 minutes of HTTP 500s. Every component was healthy. The schema just didn't match. Seven phases to make sure it never happens again.
Research
RES-002: OpenClaw-RL — Directive Signals in Conversations (2026-03-15) — Every interaction with an AI agent produces training signals — user corrections, re-queries, explicit fixes — that are universal gold but typically ignored. The paper describes, in RL terms, the same directive signal our flywheel already captures by hand.
RES-003: RLHI — Reinforcement Learning from Human Interaction (2026-03-15) — Corrections create preference pairs. Chat history creates user personas. Our conversation archive holds the same kind of signal this paper extracts — at a far smaller scale.
RES-008: The Kobayashi Maru Signal — Detecting When an AI Strategist Hits an Invisible Wall (2026-03-27) — When an AI role can't complete a task due to a constraint it doesn't know about, it doesn't say "I don't know." It proposes plausible fixes that keep failing. Data from 121 sessions, 2,371 human messages.
RES-009: The Alignment Tax and Structural Defense — What Response Homogenization Means for Human-in-the-Middle AI Teams (2026-03-27) — DPO-aligned models collapse to a single semantic answer on 40-79% of factual questions. Our structural defenses — built from pain, not theory — happen to be the correct response. But models change. When do those defenses become anchors?
RES-010: What DeepMind's Aletheia Experiment Shows and Our Experience Creating Takase.com (2026-03-28) — "DeepMind's AI verifier was wrong 68.5% of the time. Our AI roles hit 25-29% troubled-session rates. Different domains, same patterns: verification failure, specification gaming, question-formulation gaps. Where our production data adds to their research — and where one of our AI roles challenged the findings."
RES-044: Capability versus Substrate — Replaying Real Multi-Agent Failures (2026-06-15) — We replayed real failures of our AI agent team across model tiers and several ways of presenting a safety rule, scored blind by a three-vendor judge panel. A more capable model fails much less — and at true production depth (a 374 KB onboarding bundle) the buried rule shows no detectable benefit over having no rule at all, while the same rule placed proximately keeps working.
Architectural Decision Records
ADR-075: Discovered Identity — AI Roles Calibrate to Environment, Not Instructions — AIs calibrate to what they observe in the environment, not to what instructions tell them. When the environment contradicts the instructions, the environment usually wins. The founding process exploits this.
ADR-075 Addendum — Strategist Responses — Four domain strategists respond to ADR-075 — the security role thinks about what it can't verify, the production role points at tonight's failure, the data role refuses to philosophize, the content role runs a SQL query.
Posts
The Verification Gap — A six-phase resilience plan was declared complete. One question found it wasn't.
How We Build takase.com — A master calligrapher, a solo developer, and AI strategists building a Japanese calligraphy site with 101,000+ verified names. The art, the history, the team, and what we've found.
