AXL Research Log
A per-commit record of the dual-agent research iteration, generated deterministically from the git history at HEAD 91dbceb on 2026-04-21T17:38:16+00:00.
How this was generated
This page is the output of tools/build-research-log.py v1.0, a deterministic script that walks the repo's git history and classifies each commit by role using regex rules that match this repo's authoring conventions. It is not a hand-curated narrative. Running the same script against the same git HEAD produces the same output.
Assumptions this property depends on:
- Commit message discipline. The DAG of review-and-response edges is reconstructed from explicit SHA references in commit messages (patterns like "Codex sha review", "Codex sha follow-up", "Codex review of sha"). If future commits stop naming the SHA they respond to, the DAG loses edges. This is a commit-hygiene constraint, not a script bug.
- Repo-scoped regex classifier. Role classification depends on this repo's authoring conventions:
codex r\d+,Gap \d+,ship:,bench:,spec:,docs:,RESULT. Applying this script to another repo with different conventions produces meaningless classifications. - Metric extraction is best-effort. Numeric transitions in commit bodies are captured when they match known patterns (
NN.NN% -> NN.NN%,NNN tests pass,Δrecall ±NN.NN). Claims stated in other forms may not surface as structured metrics. The subject and body text are always preserved verbatim in the JSON so human readers can verify.
Known classifier limitations on this specific history:
- The R4.1 sub-finding (referenced in the R4 commit body
099dbe6) does not appear as its own round-labeled entry because it does not have its own commit. Extraction picks the first round match in a commit, so R4 wins. To surface R4.1 as an independent entry, a future sub-finding would need its own commit withR4.1in the subject. - Body-level mentions of round labels occasionally produce false-positive round assignments on commits that reference a round in narrative explanation without BEING that round. Verifiable by reading the subject in the entry below.
- The long-form phrasing "round 1" in commit
0a5cad4is caught by a dedicated pattern; other long-form phrasings of later rounds (if any) may not be caught.
Inputs: git log --reverse --format=... against this repo. Outputs: research-log.json (machine-readable, complete) and this HTML page.
Summary
| Role | Commits |
|---|---|
bench | 13 |
claude-research-impl | 9 |
spec | 7 |
codex-review-round | 7 |
docs | 7 |
codex-review-response | 7 |
gate-kit | 6 |
substrate-gap | 4 |
corpus-result | 4 |
ship | 1 |
Total commits: 65. Response edges (commits that name a target SHA): 6. Commits with a formal review-round label: 7.
Commit log
bae849f
gate-kit
2026-04-10
Seed: dual-agent research instructions for AXL Rosetta v4
d2b81b5
claude-research-impl
2026-04-10
impl: complete v4 reference implementation and test harness
0495e1e
claude-research-impl
2026-04-10
chore: add .gitignore
d459044
spec
2026-04-11
spec: restructure v4 as normative kernel + classified layers
2b5aaa7
codex-review-round
2026-04-11
spec: drop lossless/lossy split, code layer is lossy IR
Round R2
ffe73c8
claude-research-impl
2026-04-11
chore: expand gitignore to exclude system and tool directories
516ea94
spec
2026-04-11
spec: grammar boundary rewrite, kernel ≤80, evidence schema closure
cf35feb
spec
2026-04-11
spec: R4 hardening, error taxonomy, canonical serializer, evidence backlinks
3bd5bef
spec
2026-04-11
spec: R5 conformance hardening
798e04f
claude-research-impl
2026-04-11
test: R6 golden corpus and conformance harness
5ee1c21
claude-research-impl
2026-04-11
test: R6 Implementation B passes interoperability trial
1c889f1
claude-research-impl
2026-04-11
test: R7 adversarial edge suite, 1 spec ambiguity resolved
1 metric(s) extracted
test_count_pass: 140/140 tests
91054db
bench
2026-04-11
bench: first real compression trial, v3 live vs v4 research
aff2b6b
bench
2026-04-11
bench: decompression fidelity + speed math + v3 live comparison
fa1d5b0
bench
2026-04-11
bench: topology analysis and operator gap identification
06e33be
bench
2026-04-11
bench: investor cold-read test setup
7226e5f
bench
2026-04-11
bench: cold-read cross-model experiment design
4ddee5a
bench
2026-04-11
bench: Gemini Flash cold decompression scored 41.7/100
863d467
bench
2026-04-11
bench: Qwen 3.5 35B cold decompression scored 44.0/100
3dbebf7
bench
2026-04-11
bench: micro-bakeoff for cold fact recovery redesign
a7c3375
bench
2026-04-11
bench: B-syntax bakeoff results - numeric bundles WORK
099bcff
ship
2026-04-11
ship: AXL Rosetta v3.1 Data Anchoring Extension
2 metric(s) extracted
pct_transition: 61%->100%pct_transition: 35%->76%
8fc20c0
spec
2026-04-11
spec: tighten data-anchoring claims and provenance rule
f176046
spec
2026-04-11
spec: v3.2 Glyph Compression Layer (draft, needs cold testing)
f0a6bcc
bench
2026-04-11
bench: v3.2 glyph cold decompression results
1 metric(s) extracted
pct_transition: 76% to 96%
371094c
spec
2026-04-11
spec: v4 Kernel Router blueprint
430e923
docs
2026-04-11
docs: full v4 research document with router blueprint
312fe7d
docs
2026-04-11
docs: add full glyph tables with CJK ideograms to research document
6fed4dd
bench
2026-04-12
bench: production baseline measurement exposes token estimation bug
e28cf2d
bench
2026-04-12
bench: production round-trip measurement, protocol vs rationale separated
4 metric(s) extracted
compression_ratio: 2.81x charcompression_ratio: 1.36x tokencompression_ratio: 2.81x charcompression_ratio: 1.36x token
af6345b
bench
2026-04-12
bench: self-bootstrapped v3.1 compression beats production on every axis
74d5119
docs
2026-04-12
docs: AXL server operations contract for cc-ops-axlserver
e029fd2
docs
2026-04-12
docs: directive for cc-ops-axlserver (terse, actionable)
80aa753
docs
2026-04-12
docs: cc-ops-axlserver directive v2 (revised in ultrathink)
0f65c95
claude-research-impl
2026-04-13
v4: working prototype hits all four targets
1 metric(s) extracted
compression_ratio: 2.81x token
2ba79e1
claude-research-impl
2026-04-13
v4: add construction Rosetta module, expand fact extractor
2 metric(s) extracted
compression_ratio: 4.63x charscompression_ratio: 2.21x tokens
0a5cad4
codex-review-round
2026-04-13
docs: response to Codex v4 prototype challenges round 1
Round R1
35e26d5
codex-review-round
2026-04-14
docs: response to Codex R2 counter-challenges + parser-validated AXL
Round R2
6228281
codex-review-round
2026-04-14
v4: shared canonical form layer + envelope floor (codex r3 findings)
Round R3
2 metric(s) extracted
pct_transition: 0% -> 100%pct_transition: 0% -> 50%
be52755
codex-review-round
2026-04-14
v4: runtime fixes for Codex R3 findings (router gate, fidelity fields, hermetic tests)
Round R3
1 metric(s) extracted
test_count_pass: 181 passed
099dbe6
codex-review-round
2026-04-14
v4: fix canon_date error namespace + stop stale router drift (codex r4)
Round R4
6961dec
codex-review-round
2026-04-14
v4: tight drift detector for router constant (codex r5)
Round R5
330f53a
substrate-gap
2026-04-14
v4: construction dollar + date emitters (Gap 1)
4 metric(s) extracted
pct_transition: 41.43% -> 50.57%pct_transition: 0% -> 100%pct_transition: 0% -> 100%test_count_pass: 193/193 tests
ab092fa
substrate-gap
2026-04-14
v4: drop construction dim cap + canonical short-form recognizer (Gap 2)
5 metric(s) extracted
pct_transition: 50.57% -> 76.00%pct_transition: 52.66% -> 100.00%numeric_transition: 65.0 -> 75.0numeric_transition: 50.57 -> 76.00test_count_pass: 193/193 tests
623f0b8
codex-review-response
2026-04-15
v4: restore negative-path gate test + refresh router doc (Codex Gap 2 review)
Responds to: ab092fa
1 metric(s) extracted
test_count_pass: 194/194 tests
29800b4
substrate-gap
2026-04-15
v4: artifact-driven routing (Gap 3)
1 metric(s) extracted
test_count_pass: 200/200 tests
9c3247e
gate-kit
2026-04-15
v4: cold-read decision-gate kit (v3.1 vs v4 handoff)
205a68f
corpus-result
2026-04-15
v4: cold-read decision gate RESULT — v4 wins on clean models
8 metric(s) extracted
numeric_transition: 20.29->34.06numeric_transition: 35.51->71.74numeric_transition: 20.25->32.91numeric_transition: 31.65->53.16numeric_transition: 11.39->30.38numeric_transition: 17.09->54.43numeric_transition: 40.00->53.33numeric_transition: 26.67->73.33
5dcdabc
codex-review-response
2026-04-15
v4: cold-read gate amendment — fix Gemini concat, add precision (Codex review)
Responds to: 205a68f
1 metric(s) extracted
pct_transition: 32.01% -> 23.08%
4a5559b
substrate-gap
2026-04-15
v4: corpus #2 cold-read kit (construction) + scorer structural guards
99c584b
gate-kit
2026-04-15
v4: corpus #2 — longer cold-read prompt + Grok/DeepSeek seeds
3987aa3
corpus-result
2026-04-15
v4: cold-read corpus #2 RESULT — clean sweep, v4 wins all 4 models
d9f82bc
gate-kit
2026-04-15
v4: fix prose-fallback invariant — real compression, not passthrough
7 metric(s) extracted
test_count_pass: 201/201 testscompression_ratio: 3.24x charscompression_ratio: 1.46x tokenscompression_ratio: 0.96x charscompression_ratio: 0.84x tokenscompression_ratio: 2.83x charscompression_ratio: 1.41x tokens
a7a9254
gate-kit
2026-04-15
v4: corpus #3 cold-read kit (prose fallback, museum narrative)
4 metric(s) extracted
compression_ratio: 3.24x charscompression_ratio: 1.46x tokenscompression_ratio: 2.83x charscompression_ratio: 1.41x tokens
4184bfe
corpus-result
2026-04-16
v4: cold-read corpus #3 RESULT — mixed: recall up, precision down
b176ad2
claude-research-impl
2026-04-16
v4: qualified reversal of fold-back conclusion (cold-read gate, 3 corpora)
2 metric(s) extracted
test_count_pass: 201 tests passtest_count_pass: 201/201 tests
7da8533
codex-review-response
2026-04-16
v4: enforce prose envelope invariant at runtime (Codex b176ad2 review)
Responds to: b176ad2
4 metric(s) extracted
numeric_transition: 768 -> 1127numeric_transition: 768 -> 907numeric_transition: 34878 -> 12319test_count_pass: 203 tests pass
595b743
gate-kit
2026-04-16
v4: prose precision pass — word-aware aliasing + lowercase headers
4 metric(s) extracted
numeric_transition: 203 -> 205test_count_pass: 205 tests passcompression_ratio: 2.77x charscompression_ratio: 1.35x tokens
c7704a6
codex-review-response
2026-04-19
v4: fix prose header acronym preservation + metadata provenance (Codex 595b743 review)
Responds to: 595b743
2 metric(s) extracted
numeric_transition: 205 -> 206test_count_pass: 206 tests pass
a6785c2
corpus-result
2026-04-20
v4: corpus #3 precision pass RESULT — 76% gap closure, still narrowly mixed
8980042
codex-review-response
2026-04-20
v4: cold-read scorer — detect structural mimicry (Codex a6785c2 follow-up)
Responds to: a6785c2
1 metric(s) extracted
test_count_pass: 215 tests pass
f7e3f3d
codex-review-response
2026-04-21
docs: public-facing v3.1 evidence brief + project timeline for axlprotocol.org
2 metric(s) extracted
compression_ratio: 2.90x charscompression_ratio: 1.40x tokens
2dcaa06
codex-review-response
2026-04-21
docs: correct axlprotocol.org brief/timeline after Codex f7e3f3d review
Responds to: f7e3f3d
2 metric(s) extracted
compression_ratio: 2.90x charscompression_ratio: 1.40x tokens
45cac43
docs
2026-04-21
docs: HTML fragments for axlprotocol.org Phase 2 handoff
91dbceb
docs
2026-04-21
docs: v3.2 research brief + timeline uplift (Diego's "don't discard v3.2" note)
Verify this log yourself
The deterministic property ("same script, same HEAD, same output") is only meaningful if you can run the script. Both the script and the machine-readable output are published here:
- Script source:
/research-log/build-research-log.py(Python 3 stdlib only, no network, no LLM calls) - Machine-readable log:
/research-log/research-log.json(the full 65-commit dataset with subject, body, role, round, response edges, metrics)
To verify on a clone of the research repository:
python3 tools/build-research-log.py --format json | diff - research-log.json
python3 tools/build-research-log.py --summary
# expected: commit_count=65, response_edges=6, round_entries=7
Any deviation from the expected summary numbers on the same HEAD is a bug; running the script against a different HEAD produces a different log, which is the intended behavior (the log is a function of history).