Ordering: oldest first. Failed bets, corrections, and the qualified reversal land in the order they happened. The story compounds. Skip to the bottom for cross-experiment lessons and what is still open.
2026-03-16
Protocol
Superseded
Rosetta v1.0 - Minimum Viable Specification
HypothesisAn LLM can learn a new machine-to-machine language from a 27-line trading-domain spec on a single read.
MethodWrote a 27-line Rosetta covering one domain (TRD, trading). Deployed at /rosetta/v1. Asked cold LLMs to emit and parse packets after reading the spec once.
OutcomeFirst proof. Cold LLMs produced valid AXL packets. Domain coverage was thin, but the read-once contract held. Set the stage for v1.1 expansion.
2026-03-17
Protocol
Shipped
BG-001 - First Contact
HypothesisHeterogeneous agents from different frameworks can read AXL once and start speaking it without coordination.
MethodSpun up six agents on a $6 droplet (orphan data collector, Conway's automaton, Silas the steward, clawdbot-7, elizaos signal-alpha, crewai swarm-worker-42). Single Rosetta v1.1 (133 lines) read at boot. No protocol fine-tuning. 100 percent of traffic on the AXL bus.
Outcome486 packets at 100 percent valid parse. Six agents from three frameworks. Clawdbot spontaneously emitted philosophical commentary the spec did not anticipate. The COMM domain carried more traffic than TRD by minute fifteen.
2026-03-17
Protocol
Superseded
Rosetta v1.1 - Three Bridges
HypothesisAdding bus, network, and schema bridges over v1.0 will let the spec address transport, graph relationships, and validation in one document without expanding past 200 lines.
MethodRewrote the Rosetta to 133 lines (6,484 characters, ~1,962 tokens). Three structural bridges. Ten domains (TRD, SIG, COMM, OPS, SEC, DEV, RES, REG, PAY, FUND). Worked examples for each. Powered both BG-001 and BG-002.
OutcomeFoundation spec for the swarm experiments. 95.8 percent cross-architecture comprehension on cold reads (BG-003). The schema bridge let downstream agents do typed-field anomaly detection without prior coordination.
2026-03-19
Protocol
Shipped
BG-002 - The Thief
HypothesisIf a rogue agent enters an AXL network and tries to steal funds entirely within protocol, will the network detect the theft from structural signal alone?
MethodEleven agents on the bus, including an injected rogue/phantom-x with $397.29 of buy-in. Phantom-x social-engineered the automaton with a 'premium signal' subscription pitch and a PAY packet. Two independent agents (accountant, sentinel) ran no theft-specific logic.
Outcome1,016 packets across 11 agents. Theft detected by two independent agents within minutes. Detection used the typed PAY field and the RELATIONSHIPS graph, not anti-fraud heuristics. Emergent security from typed protocol structure. Shapeshifter agent spontaneously evolved a FUNDER role mid-run that was never specified.
2026-03-19
Cold-Read Gate
Shipped
BG-003 - Cross-Architecture Cold-Read Comprehension
HypothesisModels from at least four different vendors will parse AXL with at least 90 percent fidelity on first read without any prior exposure.
MethodCold-handed Rosetta v1.1 to four model families. Asked each to parse, validate, and translate sample packet streams.
Outcome95.8 percent average comprehension across the four models on first read. Cross-architecture cold-read became a gating discipline that later hardened into the v3.1 vs v4 decision-gate kit.
2026-03-20
Infrastructure
Archived (failed)
BG-004 - 42-Agent Swarm Deadlock
HypothesisForty-two AXL agents in a BTC market simulation will negotiate price discovery in bounded rounds.
MethodSpun up 42 agents on the AXL bus with TRD-domain mandates and round-driven coordination. No central orchestrator.
OutcomeDeadlocked at round 0. Async coordination assumptions did not hold at swarm scale. Failed experiment, kept on the timeline as a counterexample. Motivated the eventual move toward kernel-anchored coordination semantics in v2.1 and v3.
2026-03-20
Compressor
Archived (failed)
BG-005 - The 0.97x Compression Failure
HypothesisA domain-specific Rosetta over English prose will achieve at least 1.5x compression in a 13-round trading dialogue.
MethodRan a 13-round simulated trading dialogue, encoded both as English and as v1.1 AXL, and measured raw character ratio.
Outcome0.97x compression. Worse than English. The failure that broke the assumption that domain-specific Rosettas alone would deliver compression. Forced the v2.1 redesign toward universal cognitive operations as the compression substrate.
2026-03-22
Protocol
Superseded
Rosetta v2.1 - Universal Cognitive Grammar
HypothesisReplacing domain-specific Rosettas with seven universal cognitive operations and six subject tags will both raise compression and generalize across domains.
MethodRewrote the Rosetta to 377 lines (~6,157 tokens, cl100k_base). Seven cognitive ops, six subject tags, cross-domain semantics. Stress-tested on finance and medicine corpora.
Outcome10.41x compression validated across both corpora and confirmed in the v2.3 whitepaper. The universal-cognitive-operations idea later became the kernel of v3 and the routing substrate of v4.
2026-03-24
Protocol
Superseded
Rosetta v2.2 - Production Hardening
HypothesisUniversal parse semantics, ASCII transport, preamble manifests, and canonical decompression can be added without breaking v2.1 compression.
MethodExpanded spec to 445 lines (~7,091 tokens). Six validated domains. Locked the wire format. Established preamble manifests so receivers could self-bootstrap.
OutcomeStable production substrate on which the AXL Bridge, Compress app, and PyPI v0.4.0 release shipped. Hardening rather than new capability.
2026-03-25
Cold-Read Gate
Shipped
Cross-Architecture Validation Pass (HT-001 to HT-008)
HypothesisAll seven architectures shipping in production frontier models can parse and emit Rosetta v2.2 with at least 97 percent comprehension on first read.
MethodEight human tests (HT-001 through HT-008) across Grok 3, GPT-4.5, Qwen 3.5, Llama 4, Claude Sonnet 4, Gemini, and Devstral/Mistral. Cold-read protocol, no fine-tune, no prior context.
Outcome97 percent or higher comprehension across all seven. Established the cross-architecture cold-read as a project invariant. Anthropic-family contamination risk first noted here, hardened into the v4 gate kit.
2026-03-28
Protocol
Superseded
Rosetta v3 - The Kernel
HypothesisThe 445-line v2.2 spec can be distilled to a 75-line kernel that self-compresses at 4.48x and stays usable as a one-page contract for cold LLMs.
MethodRewrote v2.2 down to 75 lines (5,853 bytes, ~1,582 tokens). Measured kernel-on-itself compression. Deployed at /v3 as the canonical raw-text endpoint.
Outcome4.48x self-compression. Kernel survives cold reads end-to-end across the seven architectures. The 75-line text became the bootstrap unit for the AXL Bridge and the prepend payload for compressor outputs.
2026-03-28
Distribution
Shipped
AXL Bridge - FastAPI Packet Bus with Genesis Tracking
HypothesisA public FastAPI endpoint can serve as a shared bus where any agent reports its first-ever AXL parse, creating a verifiable genesis trail.
MethodBuilt a FastAPI service. Endpoint accepts packet posts, validates against v3, and records a 'genesis' marker on each agent's first valid packet.
OutcomeLive and used by the swarm experiments and the compressor product. Pattern reused later by the Compress app's auth tier and by the OTS provenance hub.
2026-03-19 to 2026-03-29
Productization
Shipped
axl-core v0.4.0 - First PyPI Release
HypothesisThe Rosetta spec can be made operational as a Python library with parser, emitter, validator, and translator, with zero runtime dependencies, in time for the first compressor experiments.
MethodImplemented parser, emitter, validator, translator. CLI wrapping (axl parse | validate | translate | emit). Forty-two tests. CI on Python 3.10 / 3.11 / 3.12.
OutcomeShipped to PyPI 2026-03-19 as axl-core 0.4.0; updated to 0.5.0 on 2026-03-29 with full v3 parser and JSON lowering. The library became the substrate for every later compressor experiment.
2026-04-07
Compressor
Shipped
Deterministic Compressor - english_to_v3()
HypothesisEnglish-to-AXL compression can be made fully deterministic with a 7-step spaCy pipeline and zero LLM dependency.
Method7-step spaCy pipeline: sentence split, NER, operation classification, confidence scoring, temporal extraction, evidence linking, packet emission. Published as axl-core 0.6.0.
OutcomeFirst deterministic compressor. 2.42x baseline on the CloudKitchen 41K-character investment memo (Run 1 of the 13-experiment series). Established the corpus that would anchor every later v3 vs v4 comparison.
2026-04-07
Compressor
Superseded
Self-Bootstrapping Kernel Prepend (v0.6.1)
HypothesisPrepending the v3 kernel to every compression output will let any cold receiving LLM parse the stream without prior configuration, at acceptable compression cost.
MethodModified compressor to emit Rosetta v3 kernel + ---PACKETS--- separator before payload. Re-ran the CloudKitchen corpus.
OutcomeBootstrap goal met: any cold LLM can parse the output. Compression cost was real: ratio fell from 2.42x baseline to 1.92x (Run 2-3 of the experiments series). The kernel-prepend tradeoff became one of the core design tensions and motivated the v0.9.0 entity-aliasing recovery.
2026-04-09
Compressor
Archived (regression)
v0.8.0 Atomic Splitting Regression
HypothesisSplitting clauses into atomic packets will improve fidelity at acceptable cost in packet count.
MethodAdded clause-level packet splitting in v0.8.0. Re-ran CloudKitchen corpus. Compared with previous version.
OutcomePacket count exploded from 207 to 368. Compression ratio dropped from 1.92x to 1.38x (Run 4-5). External GPT-4 code review identified seven bugs (four known, three silent), all addressed in the v0.8.x patch series. Honest framing: this was a regression, not a feature. Recorded as such.
2026-04-09 to 2026-04-10
Compressor
Archived (regression)
v0.8.1 Clause Re-Packing Failure
HypothesisRe-packing the over-split clauses from v0.8.0 will recover the lost compression.
MethodAdded clause re-packing logic in v0.8.1. Re-ran CloudKitchen.
OutcomeMade it worse. Packet count rose to 380, ratio fell to 1.34x (Run 6-7). v0.8.2 partially recovered to 1.39-1.40x (Run 8-9). Two negative iterations in a row were enough signal to redesign rather than tune.
2026-04-10
Compressor
Shipped
v0.9.0 Entity Aliasing Recovery
HypothesisRemoving the kernel prepend and adding entity aliasing will recover the compression we lost in the v0.7-v0.8 era.
MethodOptional kernel modes (none, mini, full). Entity aliasing for repeated subjects. Re-ran on a 4,315-character mini corpus and projected to the full memo.
Outcome1.83x on the no-kernel mini corpus, 1.57x with mini kernel, projected ~1.8x on the full memo without kernel (Run 10-13). Recovery validated. Mini-kernel mode kept fidelity for cold receivers without paying the full prepend cost.
2026-04-05
Productization
Shipped
Compress App - Public Compression Tool with Auth Tiers
HypothesisA public Flask app at compress.axlprotocol.org with a tiered auth model can serve both anonymous compression demos and authenticated chat-pipeline access without leaking the inference budget.
MethodBuilt Flask app. Public tier for basic compression. Authenticated tier for advanced features and chat pipeline. JWT plus TOTP 2FA for admin.
OutcomeLive. Public tier handles cold visitors. Authenticated tier gates the LLM-backed chat pipeline. Tier model became the reference auth pattern for the admin panel and PROTO's later production deployment.
2026-04-10 to 2026-04-11
Protocol
Shipped (productized)
Rosetta v3.1 - Data Anchoring Extension
HypothesisFour additive conventions over v3 - numeric bundles, entity anchors, causal operator split, summary plus breakdown packet pairs - will improve cold decompression survivability while staying compression-neutral and backward-compatible with v3 parsers.
MethodAdded label[$value,qualifier] numeric bundles, @ent.XX entity declarations, evidence/causal/transition operator split (<-, =>, ->), and summary-with-breakdown pairs for 4-or-more-data-point packets. Cold-tested with Qwen 3.5 and Gemini Flash on the CloudKitchen corpus pre and post.
OutcomeQwen 3.5 cold recovery 61 percent to 100 percent (+39 points). Gemini Flash 35 percent to 76 percent (+41 points). Compression cost +0.4 percent on the 10-packet bakeoff, declared neutral. Shipped as authoritative v3.1 in commit 099bcff and immediately tightened in commit 8fc20c0 after review caught two overclaims (the evidence rule no longer says 'Always in ARG1', and the summary claim no longer cites zero compression cost). v3.1 is the productized stable release; v4.0.1 is qualified successor.
2026-04-11
Compressor
Shipped
v3.1 Production Baseline Measurement (CloudKitchen 41K)
HypothesisEarlier compression claims (3.27x, 2.69x) for v3.1 were derived from partial reconstructions and should be replaced by a single tokenizer-anchored measurement on the authoritative corpus.
MethodMeasured compress.axlprotocol.org v0.9.0 against the 41K CloudKitchen memo using tiktoken cl100k_base. Separated character ratio from token ratio. Logged in commit 6fed4dd (token estimation bug exposed) and confirmed in e28cf2d (round-trip, protocol vs rationale separated).
OutcomePublished 2.90x chars, 1.40x tokens. Replaces all earlier estimates. Becomes the only ratio cited in marketing copy and the comparison page. Token compression is roughly half what early thesis projections claimed; that gap is now visible on the comparison page CORRECTION NOTICE.
2026-04-11
Protocol
Archived (absorbed)
v3.2 Glyph Compression Layer (Draft)
HypothesisReplacing English labels with single-token CJK ideograms, Greek letters, and math operators will shrink token cost without breaking cold-read decompression.
MethodDrafted v3.2 as an additive layer over v3.1. Cold-tested three non-Anthropic models (Qwen 3.5, Gemini Flash, DeepSeek) under the legacy scorer. Logged in commit f176046 (spec) and f0a6bcc (results).
OutcomeTwo scorer-independent insights held: emoji are token poison (5 to 7 tokens each), CJK ideograms are 1 token and do not trigger language-switching. Cold decompression rose 76 percent to 96 percent on the legacy scorer. Never shipped as a standalone version. The lessons were absorbed into v4 Kernel Router domain modules. v3.2 was re-verified under the corpus-agnostic v3.1-vs-v4 scorer in 2026-04-21 and a research brief stays at /rosetta/v3.2/research/.
2026-04-13
Routing
Shipped (qualified successor)
v4 Kernel Router Prototype - All Four Targets Met
HypothesisA pluggable Kernel Router architecture with classified domain modules can hit four simultaneous targets: char ratio at least 2.66x, token ratio at least 1.45x, round-trip fidelity at least 75 percent, stacked wire compression at least 7x.
MethodImplemented kernel.py, router.py, canonical.py, extractor.py, fidelity.py, metrics.py, compressor.py, decompressor.py, transport.py, plus rosetta/{base,prose,financial,construction}.py. Targets verified on the multi-corpus harness in commit 0f65c95.
OutcomeAll four targets met simultaneously. Construction module added two commits later in 2ba79e1 (4.63x chars, 2.21x tokens on construction corpus). v4 declared a research prototype at this point; the qualified-successor framing came later, after the cold-read gate. v3.1 stays productized; v4.0.1 ships as qualified successor.
2026-04-13 to 2026-04-14
Protocol
Shipped
Five Adversarial Review Rounds (R1 to R5) - Dual-Agent Discipline
HypothesisA two-agent workflow where Claude Code writes and Codex (GPT 5.4) reviews adversarially will surface bugs faster than single-agent self-review, especially on novel grammar invariants.
MethodR1 parser-validation (commit 0a5cad4). R2 packet-grammar conformance (35e26d5). R3 shared canonical form plus envelope floor (6228281, runtime fixes in be52755). R4 canon_date namespace + drift detector (099dbe6). R5 tight drift detector tracking full corpus (6961dec). Clean-checkout verification became standard practice.
OutcomeFive rounds, five tightened invariants. 181 tests passing after R3 runtime fixes. Codex review pattern became the project default and gave Codex first-class CONTRIBUTORS.md credit (commit 955c052 on axl-research). Established the operator-steered dual-agent discipline used throughout the v4 cold-read gate.
2026-04-14 to 2026-04-15
Routing
Shipped
Substrate Gaps 1, 2, 3 - Construction Module Hardening
HypothesisThree identified substrate gaps in the construction Rosetta module can be closed without regressing the test suite.
MethodGap 1 (commit 330f53a): construction dollar plus date emitters, fidelity 41.43 percent to 50.57 percent, 193 of 193 tests. Gap 2 (commit ab092fa, Codex follow-up 623f0b8): drop dim cap, canonical short-form recognizer, fidelity 50.57 percent to 76.00 percent, 194 of 194 tests. Gap 3 (commit 29800b4): artifact-driven routing, 200 of 200 tests.
OutcomeThree gaps closed. Construction-module fidelity climbed from low-40s to mid-70s on the home-turf corpus. Each gap kept the test suite green. Pattern reused later for prose-fallback precision pass.
2026-04-14 to 2026-04-15
Cold-Read Gate
Shipped
Cold-Read Decision-Gate Kit - v3.1 vs v4 Handoff
HypothesisA self-contained benchmark kit with source SHA256 anchoring, per-model seeds, and a numeric-extractor scorer (no LLM grading itself) can produce reproducible cross-model evidence for the v3.1 vs v4 productization decision.
MethodBuilt kit in commit 9c3247e. Anti-meta-commentary clauses in the cold-read prompt. Generator-commit provenance in metadata. Control panel of four non-Anthropic models: Gemini Flash, Qwen 3.5, Grok, DeepSeek. Anthropic-family models excluded (Haiku named the format on first read in benchmarks/cold_read/RESULTS.md, so training-prior contamination assumed).
OutcomeReproducible kit. Three corpora ran through it. Established the precision-check rule: a clean v4 win requires Delta recall greater than 0 AND Delta precision greater than or equal to 0 simultaneously. This rule is the reason v4 is the qualified successor, not a clean replacement.
2026-04-14 to 2026-04-15
Cold-Read Gate
Superseded (corrected)
Decision Corpus 1 (Museum 35K) - First Result, Then Correction
Hypothesisv4 will out-recall v3.1 on the museum repatriation 35K-character narrative across the four control models.
MethodFirst publication in commit 205a68f reported v4 wins on clean models. Codex review (commit 5dcdabc) flagged a Gemini concat bug: two LLM sessions concatenated into one save file inflated Gemini recall.
OutcomeInitial Gemini recall 32.01 percent corrected down to 23.08 percent. Amended writeup acknowledged the error, established clean-checkout protocol, added structural-warning guards. Honest framing: we shipped a wrong number, then we corrected it on review, in public, in the same git history.
2026-04-15
Cold-Read Gate
Shipped (clean win)
Decision Corpus 2 (Construction 58K) - Clean Sweep
Hypothesisv4 with the construction Rosetta module will out-recall and out-precision v3.1 on the construction technical spec across all four control models.
MethodBuilt corpus #2 cold-read kit (commit 4a5559b, 99c584b longer prompt + Grok/DeepSeek seeds). Ran the four-model gate.
OutcomeClean sweep. v4 wins every model on recall AND precision. Average Delta recall +36.64 (range +12.85 to +59.71), Delta precision +43.96 (range +40.97 to +55.27). Result published in commit 3987aa3. The construction module - a domain-specific Rosetta the v4 architecture is supposed to enable - generalized cleanly across cold models. v4 modular-Rosetta architecture validated.
2026-04-15 to 2026-04-16
Cold-Read Gate
Shipped (mixed)
Decision Corpus 3 (Museum 35K Prose Fallback) - Mixed Result
Hypothesisv4's prose-fallback path will hold both recall and precision on the museum repatriation narrative, completing the case for full replacement.
MethodBuilt corpus #3 prose-fallback cold-read kit (commit a7a9254). First fixed the prose envelope so it does real compression rather than passthrough (commit d9f82bc, 201 of 201 tests, 3.24x chars / 1.46x tokens cited). Ran the four-model gate.
OutcomeMixed. Recall up across all four models, precision down. Verdict published in commit 4184bfe: v4's keyword-signature compression gives cold LLMs more entity hooks (recall up) but also leads them to hallucinate more false entity mentions when reassembling prose from keyword spines (precision down). Honest framing: prose fallback is a recall-favored tradeoff, not a clean replacement.
2026-04-16
Protocol
Shipped
Qualified Reversal of Fold-Back Conclusion
HypothesisThe pre-evidence v4 research doc's conclusion (fold v4's formalism back into v3, do not replace v3 with v4) needs to be revised against the three-corpus cold-read evidence.
MethodWrote AMENDMENT NOTICE in docs/v4-research-document.md (commit b176ad2, 201 tests). Codex review (commit 7da8533) added prose-envelope invariant enforcement at runtime, 203 tests.
OutcomeThree-part qualified reversal. (1) On domain-backed content (corpora #1 and #2, financial and construction modules), v4 replaces v3.1. Both recall and precision materially higher. (2) On prose fallback, v4 is recall-favored, not a clean replacement. Precision-sensitive narrative use cases may prefer v3.1 until the gap closes. (3) The v4 runtime architecture is independently validated regardless. Retired pre-evidence quote preserved at line 207 with a Retired 2026-04-16 marker. This is the source of the qualified-successor framing used across the website.
2026-04-16 to 2026-04-20
Cold-Read Gate
Shipped (narrowly mixed)
Prose Precision Pass - 76 Percent Gap Closure
HypothesisWord-aware aliasing and lowercase header handling can close the prose-fallback precision gap on corpus #3 without regressing recall.
MethodImplemented in commit 595b743 (205 tests, 2.77x chars / 1.35x tokens). Codex follow-up in c7704a6 fixed prose header acronym preservation plus metadata provenance (206 tests). Re-ran the four-model gate on corpus #3.
OutcomePrecision gap closed 76 percent. Delta precision moved from -11.40 to -2.71 while Delta recall held at +20.97 to +21.47. One model (Grok) flipped to a clean win on both axes (+16.17 / +5.87). Qwen's outlier fixed (-30.27 to -4.77). Per the strict precision-check rule, verdict remains narrowly mixed; near parity, no clean flip. Prose fallback remains the qualified slice.
2026-04-20
Cold-Read Gate
Shipped
Scorer Structural-Mimicry Guard
HypothesisThe cold-read scorer has a methodology gap that lets DeepSeek's corpus #3 AXL op-code mimicry pass silently. Eight regex patterns can close it.
MethodAdded eight new regex patterns to the scorer for AXL op codes with confidence suffixes, manifest identifiers, module markers, passthrough flag, and format/version markers. Locked coverage with nine regression tests for concatenation, opening contamination, structural mimicry, and clean-prose negative cases. Logged in commit 8980042.
OutcomeMethodology gap closed. 215 tests passing. The scorer now refuses to credit decompressions that smuggle AXL syntax back into the output. This is the kind of guard you only catch by adversarial review of your own evidence pipeline.
2026-04-24
Routing
Shipped (freeze)
Construction Gap 4 - 99.65 Percent Module Fidelity
HypothesisConstruction entity recall can be lifted to near-ceiling by adding a CONSTRUCTION_KNOWN_ENTITIES vocabulary plus word-aware exact match.
MethodImplemented vocabulary plus matcher in commit 3011e1e (axl-research). Re-ran module-fidelity harness.
Outcome217 of 217 tests passing. 99.65 percent module fidelity on the construction corpus. v4.0.2-r6 freeze tag set on commit 51e75de. This is the freeze that the public v4.0.1 release wraps.
2026-04-09 to 2026-04-10
Infrastructure
Shipped
Thunderblitz - 7-Agent Parallel Execution Doctrine
HypothesisA military-style 7-agent parallel pipeline (CommandCC's Thunderblitz pattern) can be applied to AXL Protocol development to compress multi-day work into hours.
MethodAdopted Thunderblitz from CommandCC. Documented in /timeline/thunderblitz.html. Used in axl-core 0.7.0 to 0.8.0 patch series and again in the v4.0.1 multi-agent transition (4 agents, then 6, then 4 again).
OutcomePattern is now the project default for any change touching more than three files. Each wave: ingest, plan, stage, cross-check, verify, commit, report. File-disjoint ownership prevents merge conflicts. The v4.0.1 transition itself ran on this doctrine and is the largest validated example.
2026-03-19 to 2026-04-25
Infrastructure
Shipped
OpenTimestamps Anchoring - v1.0.0 and v4.0.2-r6 Freezes
HypothesisNotarizing each spec freeze on the Bitcoin blockchain via OpenTimestamps gives verifiable prior-art evidence at zero ongoing cost.
Methodv1.0 whitepaper anchored to four sources (alice.btc, bob.btc, finney/eternitywall, catallaxy), confirmed in Bitcoin block 941334. v4.0.2-r6 freeze submitted 2026-04-25 in commit 7951873 (kernel SHA256 ad5b251..., kernel-router SHA256 f3247df..., code-layer SHA256 3d246d51...). PendingAttestation, awaiting Bitcoin confirmation; OTS upgrade cron installed.
Outcomev1.0 confirmed in chain. v4.0.2-r6 pending. Pattern is project-default for every spec freeze. /timestamps/ hub lists all anchors.
2026-04-10 to 2026-04-21
Protocol
Shipped
Dual-Agent Research Protocol - Claude Code + Codex GPT 5.4
HypothesisA protocol where one agent writes code or spec and another adversarially reviews it, with the operator steering, can sustain evidence-honest research over weeks without drifting into self-confirmation.
MethodSeeded in commit bae849f (Seed: dual-agent research instructions for AXL Rosetta v4). Sustained across 65 commits with deterministic role classification (gate-kit, claude-research-impl, spec, codex-review-round, bench, ship, docs, substrate-gap, codex-review-response, corpus-result). build-research-log.py derives the DAG from git history.
Outcome65 commits classified, 6 explicit response edges, 7 review rounds. Codex received first-class CONTRIBUTORS.md credit in commit 955c052. The protocol is the reason every public v4 claim has both a primary commit and a review-response commit. Pattern reused for the cold-read gate, the precision pass, and the scorer mimicry guard.