AXL / ROSETTA / V4.0.1 / SPEC / ROUTER

v4.0.1 Kernel Router

Pluggable Rosetta modules. One META flag dispatches.

The kernel router is stateless. It reads the ^rosetta:MODULE flag and dispatches to a domain-aware compression module. Each module carries its own glyph palette, fidelity contract, and decompression rules. The kernel itself never changes.

AXL Rosetta v4 Kernel Router Architecture

Version: 4.1.0-blueprint Status: Blueprint (pre-draft) License: Apache 2.0 Extends: v4 Kernel (spec/v4-kernel.md)

1. Problem Statement

AXL Rosetta v3/v4 uses a single compression vocabulary for all content types. The ^mode flag (gist, qa, audit, legal, code, research, plan) affects decompression fidelity requirements but not compression strategy. This means financial data is compressed with the same glyphs as legal text, and medical records use the same vocabulary as prose summaries.

The result: domain-specific content either undercompresses (safe but wasteful) or loses critical domain signals (compact but unfaithful).

2. Key Insight

Instead of one universal compression vocabulary, AXL needs a kernel router that detects content domain and dispatches to a specialized Rosetta module. Each module carries its own compression vocabulary, fidelity contract, and decompression rules -- optimized for its domain.

The kernel itself does not change. The packet format is v3/v4 forever. The router is a single new META flag: ^rosetta:module.

Implementation Status

The spec describes a router that dispatches to six reserved modules: prose, financial, code, legal, medical, research (see Section 5 and Section 7 below).

As of v4.0.2-r6-freeze (commit 51e75de), the kernel router and three concrete Rosetta modules are implemented:

  • prose - the v3 base vocabulary; default and fallback
  • financial - CJK magnitude glyphs, numeric bundles, entity anchors
  • construction - extension module beyond the spec; technical-spec / RFI / submittal vocabulary

Four spec-listed modules remain blueprint-only and are not yet implemented: code, legal, medical, research. Operators who set ^rosetta:legal, ^rosetta:medical, ^rosetta:research, or ^rosetta:code on a packet will be served the prose fallback by the registry. Module names that are not registered route to prose by design (see Section 10, Backward compatibility guarantee). The construction module is in the registry but is not in the spec's reserved-name list, so it is currently a local extension pending spec uplift.

This is intentional. Spec ahead of implementation is the working order: each new module ships independently when its fidelity contract and cold-read benchmarks pass (see Section 10, Phase 4). Module source: src/axl_v4/rosetta/{base,prose,financial,construction}.py at github.com/dcarranza-axl/axl-research/tree/v4.0.2-r6-freeze/src/axl_v4/rosetta (repo is private; public visitors will see HTTP 404 unless authenticated).

3. Architecture

                          AXL v4 Kernel Router
                          ====================

  Source Content
       |
       v
  +--------------------+
  | KERNEL ROUTER      |    Tiny. Stateless. One job:
  | detect domain OR   |    read ^rosetta flag, or infer
  | read explicit flag |    domain from content signals.
  +--------------------+
       |
       +---> ^rosetta:prose      ---> [ Rosetta.prose      ] --+
       +---> ^rosetta:financial  ---> [ Rosetta.financial   ] --+
       +---> ^rosetta:code       ---> [ Rosetta.code        ] --+
       +---> ^rosetta:legal      ---> [ Rosetta.legal       ] --+
       +---> ^rosetta:medical    ---> [ Rosetta.medical     ] --+
       +---> ^rosetta:research   ---> [ Rosetta.research    ] --+
       |                                                        |
       v                                                        v
  Same PKT[VER|CLS|SUB|TAG|ARG1|ARG2|META] format       Compressed packets
  Same version negotiation                               with ^rosetta:XX
  Same error handling                                    in META field
       |                                                        |
       |                         DECOMPRESSION                  |
       |                              |                         |
       |                              v                         |
       |                    ^rosetta tag tells decoder           |
       |                    which module vocabulary to           |
       +<----------- use for faithful reconstruction -----------+

4. Design Principles

  1. The kernel NEVER changes. PKT format, field grammar, class codes, version negotiation, error handling -- all unchanged from v4-kernel.md.

  2. Rosetta modules are pluggable. Adding a new domain means registering a new module name. No kernel changes required.

  3. Backward compatible. Packets without ^rosetta use prose mode, which is the v3 base vocabulary. A v4.0 parser sees ^rosetta:financial as an unknown META flag and ignores it -- no parse failure.

  4. Explicit over automatic. Operators may set ^rosetta:financial directly. The router may also detect domain automatically. Explicit always wins.

  5. One module per packet. A single packet uses exactly one Rosetta vocabulary. Mixed-domain messages use multiple packets with different ^rosetta tags.

5. The ^rosetta META Flag

Syntax: ^rosetta:MODULE_NAME

Position: META field of any packet, alongside existing flags.

PKT[4|INF|Co.Rv|rpt|...|...|^rosetta:financial ^t:260410]

Reserved module names: prose, financial, code, legal, medical, research. Custom modules use dotted names: ^rosetta:org.acme.billing.

Default when absent: prose.

Interaction with ^mode

^mode and ^rosetta are orthogonal:

Flag Controls Example
^mode Decompression fidelity tier ^mode:audit = zero-lossy decompression
^rosetta Compression vocabulary ^rosetta:financial = financial glyphs

They compose: ^mode:audit ^rosetta:financial means "compress with financial vocabulary, decompress at audit fidelity (zero-lossy)." This replaces the v3 pattern where ^mode:legal had to imply both fidelity AND vocabulary.

6. Rosetta Module Interface

Every module must specify four components:

MODULE INTERFACE
================
module_name   : string        -- registered name (e.g. "financial")
glyph_palette : table         -- domain-specific glyphs and primitives
fidelity      : contract      -- what is preserved, what is lossy
decomp_rules  : grammar       -- decompression production rules
cold_read     : score 0-100   -- expected recovery % on a cold lowest-tier model

6.1 Glyph Palette

Each module defines its own compression vocabulary. Glyphs are drawn from Unicode BMP and must not conflict with kernel reserved characters (| [ ] \ ^ ;).

Modules inherit the kernel base vocabulary (class codes, base TAGs, numeric shorthand) and extend it with domain-specific primitives.

6.2 Fidelity Contract

Four-column format from v4-layer-classification.md:

Column Meaning
Gained Character reduction from this module vs prose baseline
Preserved Semantic elements that survive round-trip exactly
Allowed lossy Elements that may be paraphrased or dropped
Must be exact Elements where any alteration is a protocol error

6.3 Decompression Rules

Production rules mapping module glyphs back to natural language. A decoder that knows the module can reconstruct faithfully. A decoder that does not know the module falls back to prose-mode (best-effort).

6.4 Cold-Read Score

Integer 0-100 representing expected semantic recovery when a lowest-tier model (no AXL training, no spec in context) attempts to read the compressed output. Higher is better. Prose is the ceiling.

7. Proposed Modules

7.1 Rosetta.prose

The v3 base. Minimal glyphs. Highest cold-read score.

Property Value
Glyph palette v3 base + v3.1 math operators + v3.2 ideographic composition
Preserved Narrative structure, causal chains, entity identity, numeric values
Allowed lossy Verbose phrasing, hedging language, stylistic flourishes
Cold-read score 85-90
Target ratio 3-4x

7.2 Rosetta.financial

Heavy use of CJK magnitude glyphs, numeric bundles, entity anchors.

Property Value
Glyph palette 金(monetary), 高/大/小(magnitude), ⟹(causal chain), numeric bundles from v3.1, entity anchors, %Delta shorthand
Preserved All numeric values, entity-value bindings, causal chains, temporal ordering, currency units
Allowed lossy Narrative framing, hedging qualifiers beyond approximation marker
Must be exact Dollar amounts, percentages, entity names, direction of change
Cold-read score 60-70
Target ratio 5-6x

7.3 Rosetta.code

Lossy intermediate representation. AST-derived.

Property Value
Glyph palette fn/cl/lp/cd/rt/im/as/er/tp primitives, type shorthand (s/i/f/b/L/D/O), semicolon statements, -> returns, ^lang tags
Preserved Identifiers, string literals, numeric constants, control flow, type annotations, operation sequence
Allowed lossy Comments, whitespace, style, binding semantics, constructor internals
Must be exact Nothing formally guaranteed (empirical boundary)
Cold-read score 40-50
Target ratio 1.5-2x

7.4 Rosetta.legal

Zero-lossy mode. Exact quotation preservation.

Property Value
Glyph palette Prose base + span markers << >> for exact quotes, provenance chain operator @src:, section anchors SS.N
Preserved ALL content. Exact wording of quoted material. Provenance chains. Section references. Party identity.
Allowed lossy Nothing. Legal mode is zero-lossy by definition.
Must be exact Quoted text within span markers, party names, section numbers, dates, obligations
Cold-read score 80-85 (low compression = high readability)
Target ratio 1.5-2x

7.5 Rosetta.medical

Diagnosis chains, standardized coding, confidence-critical.

Property Value
Glyph palette ICD/SNOMED code anchors [ICD:XX.X] [SNO:XXXXX], Dx(diagnosis)/Rx(prescription)/Sx(symptom)/Hx(history) primitives, confidence-mandatory ^c: on all clinical assertions
Preserved Diagnosis codes, medication names and dosages, temporal ordering of clinical events, confidence levels
Allowed lossy Narrative framing of clinical notes, administrative boilerplate
Must be exact Drug names, dosages, ICD/SNOMED codes, allergy flags, patient identifiers
Cold-read score 55-65
Target ratio 3-4x

7.6 Rosetta.research

Citation preservation, hypothesis-evidence separation.

Property Value
Glyph palette Claim IDs [C:N], evidence tags [E:N], hypothesis markers H:, citation anchors @cite:KEY, support/contradict operators +ev/-ev
Preserved All citations, claim-evidence linkages, hypothesis identity, statistical values, methodology references
Allowed lossy Literature review narrative, verbose methodology description
Must be exact Citation keys, statistical values (p-values, confidence intervals, sample sizes), claim IDs
Cold-read score 60-70
Target ratio 3-5x

8. Routing: How Domain Detection Works

The kernel router selects a Rosetta module through a two-stage process:

Stage 1: Explicit flag. If the operator sets ^rosetta:MODULE in META, that module is used. No detection runs. This is the recommended path for production systems.

Stage 2: Automatic detection. If no ^rosetta flag is present, the router applies lightweight heuristics to select a module:

Signal Detected module
Currency symbols ($, EUR, JPY), >30% numeric density financial
^mode:code or ^lang:XX present code
Legal citation patterns (Section X, Article Y, "pursuant to") legal
ICD/SNOMED codes, drug names, clinical abbreviations medical
Citation patterns (et al., DOI, arXiv), hypothesis language research
None of the above prose (default)

Heuristics are advisory. They must be conservative -- when uncertain, fall back to prose. False routing to a specialized module is worse than undercompressing with prose, because it risks misinterpretation.

The router MUST tag the selected module in the output packet's META field. Even auto-detected modules produce explicit ^rosetta:XX tags. The decompressor never guesses.

9. Worked Example

Source text

Acme Corp reported Q4 revenue of $5.4M, up approximately 27% from $4.2M in Q3, driven by improved customer acquisition. The board approved a 15% budget increase for marketing.

Compressed with Rosetta.prose (default)

PKT[4|INF|Co.Rv|rpt↑|≈%Δ27 Q4|$4.2M→$5.4M ∵Hu.acq↑ ^rosetta:prose]
PKT[4|ACT|Mk.budget|upd↑|∴%Δ15|^ref:1 ^s:board ^rosetta:prose]

Characters: 128. Ratio: 1.8x on this sample. Cold-read: high. A model with no AXL knowledge can largely parse this.

Compressed with Rosetta.financial

PKT[4|INF|Co.Rv|rpt↑|金Q4:acme[$5.4M]⟹%Δ27←Q3[$4.2M]|∵acq↑ ^rosetta:financial]
PKT[4|ACT|Mk.budget|set|金%Δ+15|⟹^ref:1 ^s:board ^rosetta:financial]

Characters: 140. Ratio: 1.7x on this sample. But: every numeric value is structurally anchored in a numeric bundle. On a 50-packet financial report, the structural regularity of 金-anchored bundles yields 5-6x aggregate compression while preserving 100% of numeric facts. The per-packet overhead amortizes across repetitive financial data.

Key difference

Prose mode embeds $4.2M→$5.4M as a range in ARG2 -- readable but positionally fragile in long sequences. Financial mode anchors every value in a typed bundle (金Q4:acme[$5.4M]) that a financial decompressor can index, validate, and reconstruct without positional ambiguity.

10. Migration Path: v3/v4 to v4.1

Phase 1: Flag recognition (non-breaking)

Add ^rosetta to the known META flags list. Parsers that see it treat it as informational. Compression continues to use prose vocabulary. All existing v3/v4 packets are valid. No behavioral change.

Phase 2: Module registration

Define and publish Rosetta.prose (already the v3/v4 base) and Rosetta.financial as the first two modules. Encoders may begin emitting ^rosetta:financial on financial content.

Phase 3: Router activation

Enable automatic detection. Encoders emit ^rosetta:XX on all packets. Decoders use the tag to select module-specific decompression rules.

Phase 4: Module expansion

Publish Rosetta.code (absorbs current ^mode:code), Rosetta.legal, Rosetta.medical, Rosetta.research. Each module ships independently when its fidelity contract and cold-read benchmarks are validated.

Backward compatibility guarantee

At every phase: - Packets without ^rosetta parse and decompress as v4 prose (no change). - Unknown ^rosetta:XX values are ignored by older parsers (standard META flag behavior). - The kernel packet format does not change. Ever.

11. Open Questions

  1. Module versioning. Should modules carry independent version numbers (e.g., ^rosetta:financial.2) or rely on the packet VER field?

  2. Module composition. Can a packet reference two modules (^rosetta:financial+legal)? Current design says no -- one module per packet. Mixed content uses multiple packets.

  3. Custom module registry. How are ^rosetta:org.acme.billing style custom modules discovered and distributed?

  4. Cold-read benchmarks. What is the standard test harness for measuring cold-read scores across modules? Needs a reference corpus per domain.

  5. Router confidence. Should the auto-detection stage emit a confidence score (^rc:0.85) so downstream systems can decide whether to trust the routing decision?

Source & Raw Text

Repos are PRIVATE. Public visitors will see HTTP 404 unless authenticated.