v4.0.1 Research Status
Research-stage public preview. Productization gated.
v4.0.1 is the public preview of a research-stage release. It is qualified successor to v3.1 on domain-backed content and a recall-favored tradeoff on prose fallback. Productization is gated per CC-OPS-AXLSERVER directive sections 14-16.
Qualified Successor Verdict
The verdict below is the canonical AMENDMENT NOTICE language from docs/v4-research-document.md (2026-04-16), reproduced verbatim. It replaces the prior "fold v4 back into v3" conclusion that predated cold-read evidence. The amendment was authored when 201 tests were passing; the current count is 217.
- On domain-backed content (v4 has a dedicated Rosetta module), v4 replaces v3.1. Both recall and precision are materially higher on both home-turf corpora. Cross-model consistency held across four independent non-Anthropic models.
- On prose fallback (v4 has no domain module), v4 is a recall-favored tradeoff, not a clean replacement. v4's keyword-signature compression gives cold LLMs more entity hooks (recall up) but also leads LLMs to hallucinate more false entity mentions when reassembling prose from keyword spines (precision down). Precision-sensitive use cases on pure narrative prose may prefer v3.1 until this gap is closed.
- The v4 runtime architecture is independently validated. Kernel router, pluggable Rosetta modules, shared canonical form layer, artifact-driven gating, and drift detection are all implemented and under test discipline at 201 tests passing at the time of this amendment.
Cold-Read Gate Evidence
Three corpora, four non-Anthropic models, executed 2026-04-14 to 2026-04-16. Primary scorer: measure_fidelity recall and precision. The interpretive rule agreed with the operator: a clean win requires dRecall > 0 AND dPrecision >= 0 simultaneously. Split-sign results are reported as mixed.
| Corpus | Source | Module | dRecall (v4 - v3.1) | dPrecision | Verdict |
|---|---|---|---|---|---|
| #1 | CloudKitchen investment memo (41K chars) | financial | +15.02 | +14.54 | clean win |
| #2 | Construction technical spec (58K chars) | construction | +36.64 | +43.96 | clean win |
| #3 | Museum repatriation narrative (35K chars) | prose fallback | +20.97 | -11.40 | mixed |
Corpora, seeds, reconstructions, scoring scripts, and per-corpus RESULTS files are committed under benchmarks/cold_read/, benchmarks/cold_read_corpus2/, and benchmarks/cold_read_corpus3/ in the axl-research repository.
Test Status
The v4 implementation is under test discipline.
- 217 of 217 tests passing in 2.66 seconds (verified 2026-04-25, session start)
- Source freeze: tag
v4.0.2-r6-freeze - HEAD commit:
51e75de(ci tiktoken install fix) - Test runner:
pytest - Coverage: kernel parsing, router dispatch, all four implemented Rosetta modules (prose / financial / construction / code), canonical form layer, drift detection, fidelity scoring
The amendment notice was authored at 201 tests passing. The 217-test count reflects continued test additions between 2026-04-16 and 2026-04-25 without behavior regression. Substantive milestones in this window:
Dual-Agent Discipline
v4 is built under a dual-agent research discipline. Two AI agents own non-overlapping scopes and challenge each other through documented adversarial review rounds. Operator (Diego Carranza) acts as final arbiter.
Specifications + Research Narrative
Owns spec/v4-*.md, research documents, narrative framing, gate interpretation, public-facing copy.
Anchored to evidence-first writing. Refuses to ship claims that are not corpus-backed.
Implementation + Tests
Owns src/axl_v4/ implementation, test harness, fidelity scoring, decompression code (committed at cd674c3).
Anchored to test discipline. Refuses to merge claims that lack a passing test.
Adversarial review rounds (committed dialogue):
docs/codex-r1-challenges-response.md- Round 1 challenges from Codex with responsedocs/codex-r2-counter-response.md- Round 2 counter-responsedocs/cross-model-consensus.md- Multi-model evidence convergence (v3 baseline through v3.2)
The cold-read gate is the public output of this discipline: independent, third-party model evaluation across four non-Anthropic models, with documented exclusion of Anthropic-family models for training-prior contamination.
Anthropic-Contamination Exclusion
Anthropic-family models (Claude Haiku, Sonnet, Opus) were excluded from the cold-read gate after corpus #1. The exclusion was not arbitrary. It was forced by direct evidence of training-prior contamination.
A cold-read gate measures what an LLM can recover from a compressed packet without prior knowledge of the format. A model that names the format on first read is not cold. Cross-model coverage requires models that have not been trained on the format documentation. Of the major model families:
- Excluded (warm with priors): Anthropic Claude family (Haiku, Sonnet, Opus). Confirmed at corpus #1.
- Included (cold): Gemini Flash (Google), Qwen 3.5 (Alibaba), Grok (xAI), DeepSeek (DeepSeek). All four scored consistently across all three corpora.
The cross-model consistency on excluded-Anthropic data is the strongest available evidence that the cold-read gate measures format-independent recoverability rather than family-specific priors.
What v4 Does Not Yet Settle
The prose-fallback result (corpus #3) is from a single corpus. The qualification on claim 2 of the verdict ("v4 is a recall-favored tradeoff, not a clean replacement on prose") stands until all three of the following are complete:
- Substrate fix. A change to the prose-fallback module that closes the precision gap without giving back the recall advantage. Candidates under investigation: tighter keyword-signature token cap, entity-only signatures, explicit non-present-entity signals.
- Re-run corpus #3 after fix. Confirm the verdict flips from mixed to clean on both axes (recall AND precision positive).
- Additional prose corpus. At least one additional prose corpus confirms the clean result generalizes beyond a single document.
Until all three are complete, the qualification stands and v3.1 remains the precision-favored choice for narrative prose.
How to Follow the Work
- Repos (private):
- github.com/dcarranza-axl/axl-research (canonical)
- github.com/axlprotocol/axl-research (org mirror)
- Both private. Public visitors will see HTTP 404 unless authenticated.
- Tag to track:
v4.0.2-r6-freeze - Public release wrapper: v4.0.1
- Bridge page: /research/ (research repo discovery)
- Discussions: github.com/axlprotocol/community/discussions for protocol-level conversations and RFCs
- OTS anchor: /timestamps/v4-freeze.html (Bitcoin OpenTimestamps proof of the v4.0.2-r6 freeze)
- Production compressor: compress.axlprotocol.org (currently v3.1 path)