AXL / ROSETTA / V4.0.1 / COMPARISON

v3.1 vs v4.0.1 - Side by Side

Productized stable vs research-stage qualified successor.

v3.1 is the productized stable release. v4.0.1 is the research-stage public preview, qualified successor on domain-backed content. Both share the same kernel format. The differences are above the kernel: vocabulary, dispatch, modules, and validation regime.

Capability Table

Capabilityv3.1v4.0.1
Wire formatPKT[VER|CLS|SUB|TAG|ARG1|ARG2|META]same (unchanged)
Kernel size75 lines94 lines
Compression vocabularyOne universalRouter + N modules
Domain modulesNoneprose, financial, construction, code
Math operatorsYes (ARG2 math-arg grammar)Yes (unchanged)
Ideographic composition (v3.2)YesYes (unchanged)
Code compressionNoYes (lossy structural IR)
Drift detectionNoYes (runtime architecture)
Artifact-driven gatingManual reviewYes (artifact-anchored gate)
Backward compatibilityv3 packets parsev3, v3.1, v3.2 packets parse
Tests passing(legacy count)217 / 217
StatusProductized stableResearch preview (qualified successor)
Public release labelv3.1v4.0.1 (frozen at v4.0.2-r6 / 51e75de)
Pip packageaxl-core v0.10.xPending productization gate close

Cold-Read Gate Evidence

Three corpora, four non-Anthropic models, executed 2026-04-14 to 2026-04-16. The interpretive rule agreed with the operator: a clean win requires dRecall > 0 AND dPrecision >= 0 simultaneously. Split-sign results are reported as mixed.

CorpusSourceModuledRecall (v4 - v3.1)dPrecisionVerdict
#1CloudKitchen investment memo (41K chars)financial+15.02+14.54clean win
#2Construction technical spec (58K chars)construction+36.64+43.96clean win
#3Museum repatriation narrative (35K chars)prose fallback+20.97-11.40mixed

Source: docs/v4-research-document.md AMENDMENT NOTICE (2026-04-16). Per-corpus RESULTS files committed under benchmarks/cold_read/, benchmarks/cold_read_corpus2/, benchmarks/cold_read_corpus3/.

Cross-Model Coverage

The cold-read gate ran across four independent non-Anthropic models. Anthropic-family models were excluded after corpus #1 confirmed training-prior contamination (Haiku named the format on first read).

ModelProviderRole
Gemini FlashGoogleCold-read scorer
Qwen 3.5AlibabaCold-read scorer
GrokxAICold-read scorer
DeepSeekDeepSeekCold-read scorer
Claude HaikuAnthropicExcluded (warm with priors)

Evidence quote: "both Haiku runs opened with explicit meta-commentary identifying the format by name." Source: benchmarks/cold_read/RESULTS.md.

Compression Ratios

Correction Notice

Token compression is roughly half what early thesis projections claimed

The v4 research document was originally written using estimated token counts and tests against a reconstructed partial memo. On 2026-04-11 a production measurement was run against the live compressor at compress.axlprotocol.org v0.9.0 using the authoritative 41,256-char CloudKitchen investment memo and tiktoken with the cl100k_base encoding. The corrected numbers are below.

MetricPrior (estimate)Measured (production)
Char compression3.27x2.90x
Token compression2.69x1.40x
Tokens saved per message~2,5002,582

Honest framing. Character compression is close to the estimate. Token compression is roughly half what the earlier tests suggested. Cost savings are real (about 29% of tokens, 2,582 tokens saved per message) but significantly smaller than the "4x token reduction" the original thesis implied. All token-related claims in older v4 material should be read as corrected by this notice.

Source: docs/v4-research-document.md CORRECTION NOTICE (lines 8-30) in axl-research @ v4.0.2-r6-freeze (commit 51e75de). Full production measurement: benchmarks/production_baseline.md. Repo is private; access via productization gate.

Three numbers matter. Character compression. Token compression. Cold-read precision and recall. The first two come from production measurement; the third comes from the cold-read gate above. Char and token compression are different metrics and the gap between them is the headline finding of the 2026-04-11 production baseline.

VersionMetricRatioSource
v3 baselineChar compression3.06xcross-model-consensus.md
v3.2 ideographicChar compression4.28xcross-model-consensus.md
v4 production (CloudKitchen 41K memo)Char compression2.90xv4-research-document.md (corrected 2026-04-11)
v4 production (CloudKitchen 41K memo)Token compression (tiktoken cl100k_base)1.40xv4-research-document.md (corrected 2026-04-11)
v4 production (CloudKitchen 41K memo)Tokens saved per message2,582compress.axlprotocol.org v0.9.0
v4 production (CloudKitchen 41K memo)Token cost savings~29%derived from 1.40x token ratio

Why Char and Token Ratios Diverge

Character compression measures the byte-level shrink of a packet relative to the source. Token compression measures the same shrink in tokenizer units (here, tiktoken with cl100k_base, the GPT-4-family encoding). AXL's compressed packets pack more semantic density per character, but tokenizers segment short symbolic sequences less efficiently than long natural-language strings. The result: a packet that is 2.90x shorter in characters is only 1.40x shorter in tokens. The gap between 2.90x and 1.40x is the cost of tokenization, not a flaw in the kernel.

The cold-read gate above measures something different again (semantic recovery via recall + precision) and is the primary v4 vs v3.1 evidence. Compression ratio is necessary but not sufficient: a packet that compresses well but does not survive cold decompression is not a win.

Measurement Provenance

  • Tokenizer: tiktoken with cl100k_base encoding (GPT-4-family).
  • Source corpus: CloudKitchen investment memo, 41,256 characters (authoritative full memo, not a reconstructed partial).
  • Compressor: compress.axlprotocol.org v0.9.0 (production endpoint).
  • Measurement date: 2026-04-11.
  • Canonical record: CORRECTION NOTICE at the head of docs/v4-research-document.md (lines 8-30) and benchmarks/production_baseline.md in axl-research @ v4.0.2-r6-freeze (commit 51e75de). Repository is private; full audit access is gated by the productization checkpoint (CC-OPS-AXLSERVER directive sections 14-16). Public mirror lives under github.com/axlprotocol/axl-research with the same privacy posture.

What v3.1 Still Does Best

  • Pure narrative prose. Corpus #3 shows v4 prose-fallback gives -11.40 dPrecision relative to v3.1. Until the prose-fallback gap is closed, v3.1 is the precision-favored choice for narrative prose.
  • SLA-bound production paths. v3.1 carries the productized badge. v4.0.1 is research-stage until the productization gate (CC-OPS-AXLSERVER directive sections 14-16) closes.
  • Tooling reach. The pip-published axl-core targets v3.1. Five existing MCP plugins target v3.1.
  • Stability. v3.1 has a longer track record. v4 architectural primitives are validated under test discipline but the production deployment runway is shorter.

What v4.0.1 Adds

  • Domain-aware compression. Financial and construction modules show clean wins on home-turf corpora.
  • Code compression. First AXL surface for source-code packets. Lossy structural IR.
  • Pluggable architecture. Adding a new domain means registering a new module name. No kernel change required.
  • Drift detection. Runtime architecture catches vocabulary drift across module versions.
  • Artifact-driven gating. The production gate is anchored to committed artifacts; gate evidence is committed alongside results.
  • Cross-model validated. Cold-read gate ran across four non-Anthropic models with documented contamination exclusion.

How to Choose

For a complete migration walkthrough see /rosetta/v4/migration/. For the full research-stage status and AMENDMENT NOTICE, see /rosetta/v4/research/.