Fine Japanese Calligraphy

The Art of Master Japanese Calligrapher Eri Takase


RES-002: OpenClaw-RL — Directive Signals in Conversations (2026-03-15)

Citation

Yinjie Wang, Xuyang Chen, et al. "OpenClaw-RL: Train Any Agent Simply by Talking." arXiv: 2603.10165, March 2026.

Key Findings

The Core Argument

Human corrections in conversations are the highest-quality training signal available — and current systems throw it away.

Why It Matters to Us

This paper validates our flywheel (ADR-047) and justifies the docs-meta initiative:

  1. The flywheel IS manual OpenClaw-RL. Every time Tim corrects a strategist — "no, the NAS roles are reversed," "read the spec before asserting facts" — that correction gets written into session states, concept-checks, lessons, and memory files. We don't feed it back into model weights (we can't), but we persist it in documentation so the next session inherits the correction. The paper says this signal is gold; we already treat it as gold.

  2. 949 conversations are an asset, not history. The JSONL conversation archive contains thousands of directive signals — corrections, re-queries, "no not that, instead do..." moments. With 1M context, old sessions can be fully re-read. This is the raw correction data the paper says current systems waste.

  3. docs-meta is the retrieval infrastructure. The conversation archive is locked in JSONL files that nobody can search. docs-meta (the planned vault — gitignored, invisible to engineer roles, curated session index) makes this data findable and usable. RES-002 is the research justification for building docs-meta.

  4. The compounding is real. 143 str-takase sessions. 83 str-ishizue sessions. 41 str-mamori sessions. Each session's corrections compound into the next. The project's documentation IS the learned model — not weights in a neural network, but structured knowledge in files. Same function, different substrate.

Empirical Confirmation (RES-008, 2026-03-26)

RES-008 used the conversation archive this paper justified building. str-michi searched 121 JSONL session files, extracted Tim's corrections (including profanity as a frustration marker), computed correction rates, and discovered that 83% of AI strategist failures involved documented constraints that weren't consulted. The corrections — the "directive signals" this paper says systems throw away — became the dataset that diagnosed the failure mode.

The prediction that correction data is "universal gold" was confirmed: Tim's profanity turned out to be a precise instrument for measuring system failure rates across 2,371 human messages. The rawest form of human correction was the most analytically useful.


Discovered 2026-03-14. Part of the three-paper arc: RES-001 (why HITM) → RES-002 (why the flywheel works) → RES-003 (why docs-meta is worth building).