AXL / RESEARCH LOG

AXL Research Log

A per-commit record of the dual-agent research iteration, generated deterministically from the git history at HEAD 91dbceb on 2026-04-21T17:38:16+00:00.

How this was generated

This page is the output of tools/build-research-log.py v1.0, a deterministic script that walks the repo's git history and classifies each commit by role using regex rules that match this repo's authoring conventions. It is not a hand-curated narrative. Running the same script against the same git HEAD produces the same output.

Assumptions this property depends on:

  1. Commit message discipline. The DAG of review-and-response edges is reconstructed from explicit SHA references in commit messages (patterns like "Codex sha review", "Codex sha follow-up", "Codex review of sha"). If future commits stop naming the SHA they respond to, the DAG loses edges. This is a commit-hygiene constraint, not a script bug.
  2. Repo-scoped regex classifier. Role classification depends on this repo's authoring conventions: codex r\d+, Gap \d+, ship:, bench:, spec:, docs:, RESULT. Applying this script to another repo with different conventions produces meaningless classifications.
  3. Metric extraction is best-effort. Numeric transitions in commit bodies are captured when they match known patterns (NN.NN% -> NN.NN%, NNN tests pass, Δrecall ±NN.NN). Claims stated in other forms may not surface as structured metrics. The subject and body text are always preserved verbatim in the JSON so human readers can verify.

Known classifier limitations on this specific history:

Inputs: git log --reverse --format=... against this repo. Outputs: research-log.json (machine-readable, complete) and this HTML page.

Summary

RoleCommits
bench13
claude-research-impl9
spec7
codex-review-round7
docs7
codex-review-response7
gate-kit6
substrate-gap4
corpus-result4
ship1

Total commits: 65. Response edges (commits that name a target SHA): 6. Commits with a formal review-round label: 7.

Commit log

bae849f gate-kit 2026-04-10

Seed: dual-agent research instructions for AXL Rosetta v4

d2b81b5 claude-research-impl 2026-04-10

impl: complete v4 reference implementation and test harness

0495e1e claude-research-impl 2026-04-10

chore: add .gitignore

d459044 spec 2026-04-11

spec: restructure v4 as normative kernel + classified layers

2b5aaa7 codex-review-round 2026-04-11

spec: drop lossless/lossy split, code layer is lossy IR

Round R2

ffe73c8 claude-research-impl 2026-04-11

chore: expand gitignore to exclude system and tool directories

516ea94 spec 2026-04-11

spec: grammar boundary rewrite, kernel ≤80, evidence schema closure

cf35feb spec 2026-04-11

spec: R4 hardening, error taxonomy, canonical serializer, evidence backlinks

3bd5bef spec 2026-04-11

spec: R5 conformance hardening

798e04f claude-research-impl 2026-04-11

test: R6 golden corpus and conformance harness

5ee1c21 claude-research-impl 2026-04-11

test: R6 Implementation B passes interoperability trial

1c889f1 claude-research-impl 2026-04-11

test: R7 adversarial edge suite, 1 spec ambiguity resolved

1 metric(s) extracted
  • test_count_pass: 140/140 tests

91054db bench 2026-04-11

bench: first real compression trial, v3 live vs v4 research

aff2b6b bench 2026-04-11

bench: decompression fidelity + speed math + v3 live comparison

fa1d5b0 bench 2026-04-11

bench: topology analysis and operator gap identification

06e33be bench 2026-04-11

bench: investor cold-read test setup

7226e5f bench 2026-04-11

bench: cold-read cross-model experiment design

4ddee5a bench 2026-04-11

bench: Gemini Flash cold decompression scored 41.7/100

863d467 bench 2026-04-11

bench: Qwen 3.5 35B cold decompression scored 44.0/100

3dbebf7 bench 2026-04-11

bench: micro-bakeoff for cold fact recovery redesign

a7c3375 bench 2026-04-11

bench: B-syntax bakeoff results - numeric bundles WORK

099bcff ship 2026-04-11

ship: AXL Rosetta v3.1 Data Anchoring Extension

2 metric(s) extracted
  • pct_transition: 61%->100%
  • pct_transition: 35%->76%

8fc20c0 spec 2026-04-11

spec: tighten data-anchoring claims and provenance rule

f176046 spec 2026-04-11

spec: v3.2 Glyph Compression Layer (draft, needs cold testing)

f0a6bcc bench 2026-04-11

bench: v3.2 glyph cold decompression results

1 metric(s) extracted
  • pct_transition: 76% to 96%

371094c spec 2026-04-11

spec: v4 Kernel Router blueprint

430e923 docs 2026-04-11

docs: full v4 research document with router blueprint

312fe7d docs 2026-04-11

docs: add full glyph tables with CJK ideograms to research document

6fed4dd bench 2026-04-12

bench: production baseline measurement exposes token estimation bug

e28cf2d bench 2026-04-12

bench: production round-trip measurement, protocol vs rationale separated

4 metric(s) extracted
  • compression_ratio: 2.81x char
  • compression_ratio: 1.36x token
  • compression_ratio: 2.81x char
  • compression_ratio: 1.36x token

af6345b bench 2026-04-12

bench: self-bootstrapped v3.1 compression beats production on every axis

74d5119 docs 2026-04-12

docs: AXL server operations contract for cc-ops-axlserver

e029fd2 docs 2026-04-12

docs: directive for cc-ops-axlserver (terse, actionable)

80aa753 docs 2026-04-12

docs: cc-ops-axlserver directive v2 (revised in ultrathink)

0f65c95 claude-research-impl 2026-04-13

v4: working prototype hits all four targets

1 metric(s) extracted
  • compression_ratio: 2.81x token

2ba79e1 claude-research-impl 2026-04-13

v4: add construction Rosetta module, expand fact extractor

2 metric(s) extracted
  • compression_ratio: 4.63x chars
  • compression_ratio: 2.21x tokens

0a5cad4 codex-review-round 2026-04-13

docs: response to Codex v4 prototype challenges round 1

Round R1

35e26d5 codex-review-round 2026-04-14

docs: response to Codex R2 counter-challenges + parser-validated AXL

Round R2

6228281 codex-review-round 2026-04-14

v4: shared canonical form layer + envelope floor (codex r3 findings)

Round R3

2 metric(s) extracted
  • pct_transition: 0% -> 100%
  • pct_transition: 0% -> 50%

be52755 codex-review-round 2026-04-14

v4: runtime fixes for Codex R3 findings (router gate, fidelity fields, hermetic tests)

Round R3

1 metric(s) extracted
  • test_count_pass: 181 passed

099dbe6 codex-review-round 2026-04-14

v4: fix canon_date error namespace + stop stale router drift (codex r4)

Round R4

6961dec codex-review-round 2026-04-14

v4: tight drift detector for router constant (codex r5)

Round R5

330f53a substrate-gap 2026-04-14

v4: construction dollar + date emitters (Gap 1)

4 metric(s) extracted
  • pct_transition: 41.43% -> 50.57%
  • pct_transition: 0% -> 100%
  • pct_transition: 0% -> 100%
  • test_count_pass: 193/193 tests

ab092fa substrate-gap 2026-04-14

v4: drop construction dim cap + canonical short-form recognizer (Gap 2)

5 metric(s) extracted
  • pct_transition: 50.57% -> 76.00%
  • pct_transition: 52.66% -> 100.00%
  • numeric_transition: 65.0 -> 75.0
  • numeric_transition: 50.57 -> 76.00
  • test_count_pass: 193/193 tests

623f0b8 codex-review-response 2026-04-15

v4: restore negative-path gate test + refresh router doc (Codex Gap 2 review)

Responds to: ab092fa

1 metric(s) extracted
  • test_count_pass: 194/194 tests

29800b4 substrate-gap 2026-04-15

v4: artifact-driven routing (Gap 3)

1 metric(s) extracted
  • test_count_pass: 200/200 tests

9c3247e gate-kit 2026-04-15

v4: cold-read decision-gate kit (v3.1 vs v4 handoff)

205a68f corpus-result 2026-04-15

v4: cold-read decision gate RESULT — v4 wins on clean models

8 metric(s) extracted
  • numeric_transition: 20.29->34.06
  • numeric_transition: 35.51->71.74
  • numeric_transition: 20.25->32.91
  • numeric_transition: 31.65->53.16
  • numeric_transition: 11.39->30.38
  • numeric_transition: 17.09->54.43
  • numeric_transition: 40.00->53.33
  • numeric_transition: 26.67->73.33

5dcdabc codex-review-response 2026-04-15

v4: cold-read gate amendment — fix Gemini concat, add precision (Codex review)

Responds to: 205a68f

1 metric(s) extracted
  • pct_transition: 32.01% -> 23.08%

4a5559b substrate-gap 2026-04-15

v4: corpus #2 cold-read kit (construction) + scorer structural guards

99c584b gate-kit 2026-04-15

v4: corpus #2 — longer cold-read prompt + Grok/DeepSeek seeds

3987aa3 corpus-result 2026-04-15

v4: cold-read corpus #2 RESULT — clean sweep, v4 wins all 4 models

d9f82bc gate-kit 2026-04-15

v4: fix prose-fallback invariant — real compression, not passthrough

7 metric(s) extracted
  • test_count_pass: 201/201 tests
  • compression_ratio: 3.24x chars
  • compression_ratio: 1.46x tokens
  • compression_ratio: 0.96x chars
  • compression_ratio: 0.84x tokens
  • compression_ratio: 2.83x chars
  • compression_ratio: 1.41x tokens

a7a9254 gate-kit 2026-04-15

v4: corpus #3 cold-read kit (prose fallback, museum narrative)

4 metric(s) extracted
  • compression_ratio: 3.24x chars
  • compression_ratio: 1.46x tokens
  • compression_ratio: 2.83x chars
  • compression_ratio: 1.41x tokens

4184bfe corpus-result 2026-04-16

v4: cold-read corpus #3 RESULT — mixed: recall up, precision down

b176ad2 claude-research-impl 2026-04-16

v4: qualified reversal of fold-back conclusion (cold-read gate, 3 corpora)

2 metric(s) extracted
  • test_count_pass: 201 tests pass
  • test_count_pass: 201/201 tests

7da8533 codex-review-response 2026-04-16

v4: enforce prose envelope invariant at runtime (Codex b176ad2 review)

Responds to: b176ad2

4 metric(s) extracted
  • numeric_transition: 768 -> 1127
  • numeric_transition: 768 -> 907
  • numeric_transition: 34878 -> 12319
  • test_count_pass: 203 tests pass

595b743 gate-kit 2026-04-16

v4: prose precision pass — word-aware aliasing + lowercase headers

4 metric(s) extracted
  • numeric_transition: 203 -> 205
  • test_count_pass: 205 tests pass
  • compression_ratio: 2.77x chars
  • compression_ratio: 1.35x tokens

c7704a6 codex-review-response 2026-04-19

v4: fix prose header acronym preservation + metadata provenance (Codex 595b743 review)

Responds to: 595b743

2 metric(s) extracted
  • numeric_transition: 205 -> 206
  • test_count_pass: 206 tests pass

a6785c2 corpus-result 2026-04-20

v4: corpus #3 precision pass RESULT — 76% gap closure, still narrowly mixed

8980042 codex-review-response 2026-04-20

v4: cold-read scorer — detect structural mimicry (Codex a6785c2 follow-up)

Responds to: a6785c2

1 metric(s) extracted
  • test_count_pass: 215 tests pass

f7e3f3d codex-review-response 2026-04-21

docs: public-facing v3.1 evidence brief + project timeline for axlprotocol.org

2 metric(s) extracted
  • compression_ratio: 2.90x chars
  • compression_ratio: 1.40x tokens

2dcaa06 codex-review-response 2026-04-21

docs: correct axlprotocol.org brief/timeline after Codex f7e3f3d review

Responds to: f7e3f3d

2 metric(s) extracted
  • compression_ratio: 2.90x chars
  • compression_ratio: 1.40x tokens

45cac43 docs 2026-04-21

docs: HTML fragments for axlprotocol.org Phase 2 handoff

91dbceb docs 2026-04-21

docs: v3.2 research brief + timeline uplift (Diego's "don't discard v3.2" note)

Verify this log yourself

The deterministic property ("same script, same HEAD, same output") is only meaningful if you can run the script. Both the script and the machine-readable output are published here:

To verify on a clone of the research repository:

python3 tools/build-research-log.py --format json | diff - research-log.json
python3 tools/build-research-log.py --summary
# expected: commit_count=65, response_edges=6, round_entries=7

Any deviation from the expected summary numbers on the same HEAD is a bug; running the script against a different HEAD produces a different log, which is the intended behavior (the log is a function of history).