v4.0.1 Kernel Router

Pluggable Rosetta modules. One META flag dispatches.

The kernel router is stateless. It reads the ^rosetta:MODULE flag and dispatches to a domain-aware compression module. Each module carries its own glyph palette, fidelity contract, and decompression rules. The kernel itself never changes.

Spec Status Version label: 4.1.0-blueprint. Implementation complete and under test discipline as of 2026-04-25 (217 of 217 tests passing). Sources: src/axl_v4/router.py, src/axl_v4/rosetta/{base,prose,financial,construction}.py.

AXL Rosetta v4 Kernel Router Architecture

Version: 4.1.0-blueprint Status: Blueprint (pre-draft) License: Apache 2.0 Extends: v4 Kernel (spec/v4-kernel.md)

1. Problem Statement

AXL Rosetta v3/v4 uses a single compression vocabulary for all content types. The ^mode flag (gist, qa, audit, legal, code, research, plan) affects decompression fidelity requirements but not compression strategy. This means financial data is compressed with the same glyphs as legal text, and medical records use the same vocabulary as prose summaries.

The result: domain-specific content either undercompresses (safe but wasteful) or loses critical domain signals (compact but unfaithful).

2. Key Insight

Instead of one universal compression vocabulary, AXL needs a kernel router that detects content domain and dispatches to a specialized Rosetta module. Each module carries its own compression vocabulary, fidelity contract, and decompression rules -- optimized for its domain.

The kernel itself does not change. The packet format is v3/v4 forever. The router is a single new META flag: ^rosetta:module.

Implementation Status

The spec describes a router that dispatches to six reserved modules: prose, financial, code, legal, medical, research (see Section 5 and Section 7 below).

As of v4.0.2-r6-freeze (commit 51e75de), the kernel router and three concrete Rosetta modules are implemented:

prose - the v3 base vocabulary; default and fallback
financial - CJK magnitude glyphs, numeric bundles, entity anchors
construction - extension module beyond the spec; technical-spec / RFI / submittal vocabulary

Four spec-listed modules remain blueprint-only and are not yet implemented: code, legal, medical, research. Operators who set ^rosetta:legal, ^rosetta:medical, ^rosetta:research, or ^rosetta:code on a packet will be served the prose fallback by the registry. Module names that are not registered route to prose by design (see Section 10, Backward compatibility guarantee). The construction module is in the registry but is not in the spec's reserved-name list, so it is currently a local extension pending spec uplift.

This is intentional. Spec ahead of implementation is the working order: each new module ships independently when its fidelity contract and cold-read benchmarks pass (see Section 10, Phase 4). Module source: src/axl_v4/rosetta/{base,prose,financial,construction}.py at github.com/dcarranza-axl/axl-research/tree/v4.0.2-r6-freeze/src/axl_v4/rosetta (repo is private; public visitors will see HTTP 404 unless authenticated).

3. Architecture

                          AXL v4 Kernel Router
                          ====================

  Source Content
       |
       v
  +--------------------+
  | KERNEL ROUTER      |    Tiny. Stateless. One job:
  | detect domain OR   |    read ^rosetta flag, or infer
  | read explicit flag |    domain from content signals.
  +--------------------+
       |
       +---> ^rosetta:prose      ---> [ Rosetta.prose      ] --+
       +---> ^rosetta:financial  ---> [ Rosetta.financial   ] --+
       +---> ^rosetta:code       ---> [ Rosetta.code        ] --+
       +---> ^rosetta:legal      ---> [ Rosetta.legal       ] --+
       +---> ^rosetta:medical    ---> [ Rosetta.medical     ] --+
       +---> ^rosetta:research   ---> [ Rosetta.research    ] --+
       |                                                        |
       v                                                        v
  Same PKT[VER|CLS|SUB|TAG|ARG1|ARG2|META] format       Compressed packets
  Same version negotiation                               with ^rosetta:XX
  Same error handling                                    in META field
       |                                                        |
       |                         DECOMPRESSION                  |
       |                              |                         |
       |                              v                         |
       |                    ^rosetta tag tells decoder           |
       |                    which module vocabulary to           |
       +<----------- use for faithful reconstruction -----------+

4. Design Principles

The kernel NEVER changes. PKT format, field grammar, class codes, version negotiation, error handling -- all unchanged from v4-kernel.md.
Rosetta modules are pluggable. Adding a new domain means registering a new module name. No kernel changes required.
Backward compatible. Packets without ^rosetta use prose mode, which is the v3 base vocabulary. A v4.0 parser sees ^rosetta:financial as an unknown META flag and ignores it -- no parse failure.
Explicit over automatic. Operators may set ^rosetta:financial directly. The router may also detect domain automatically. Explicit always wins.
One module per packet. A single packet uses exactly one Rosetta vocabulary. Mixed-domain messages use multiple packets with different ^rosetta tags.

5. The ^rosetta META Flag

Syntax: ^rosetta:MODULE_NAME

Position: META field of any packet, alongside existing flags.

PKT[4|INF|Co.Rv|rpt|...|...|^rosetta:financial ^t:260410]

Reserved module names: prose, financial, code, legal, medical, research. Custom modules use dotted names: ^rosetta:org.acme.billing.

Default when absent: prose.

Interaction with ^mode

^mode and ^rosetta are orthogonal:

Flag	Controls	Example
`^mode`	Decompression fidelity tier	`^mode:audit` = zero-lossy decompression
`^rosetta`	Compression vocabulary	`^rosetta:financial` = financial glyphs

They compose: ^mode:audit ^rosetta:financial means "compress with financial vocabulary, decompress at audit fidelity (zero-lossy)." This replaces the v3 pattern where ^mode:legal had to imply both fidelity AND vocabulary.

6. Rosetta Module Interface

Every module must specify four components:

MODULE INTERFACE
================
module_name   : string        -- registered name (e.g. "financial")
glyph_palette : table         -- domain-specific glyphs and primitives
fidelity      : contract      -- what is preserved, what is lossy
decomp_rules  : grammar       -- decompression production rules
cold_read     : score 0-100   -- expected recovery % on a cold lowest-tier model

6.1 Glyph Palette

Each module defines its own compression vocabulary. Glyphs are drawn from Unicode BMP and must not conflict with kernel reserved characters (| [ ] \ ^ ;).

Modules inherit the kernel base vocabulary (class codes, base TAGs, numeric shorthand) and extend it with domain-specific primitives.

6.2 Fidelity Contract

Four-column format from v4-layer-classification.md:

Column	Meaning
Gained	Character reduction from this module vs prose baseline
Preserved	Semantic elements that survive round-trip exactly
Allowed lossy	Elements that may be paraphrased or dropped
Must be exact	Elements where any alteration is a protocol error

6.3 Decompression Rules

Production rules mapping module glyphs back to natural language. A decoder that knows the module can reconstruct faithfully. A decoder that does not know the module falls back to prose-mode (best-effort).

6.4 Cold-Read Score

Integer 0-100 representing expected semantic recovery when a lowest-tier model (no AXL training, no spec in context) attempts to read the compressed output. Higher is better. Prose is the ceiling.

7. Proposed Modules

7.1 Rosetta.prose

The v3 base. Minimal glyphs. Highest cold-read score.

Property	Value
Glyph palette	v3 base + v3.1 math operators + v3.2 ideographic composition
Preserved	Narrative structure, causal chains, entity identity, numeric values
Allowed lossy	Verbose phrasing, hedging language, stylistic flourishes
Cold-read score	85-90
Target ratio	3-4x

7.2 Rosetta.financial

Heavy use of CJK magnitude glyphs, numeric bundles, entity anchors.

Property	Value
Glyph palette	金(monetary), 高/大/小(magnitude), ⟹(causal chain), numeric bundles from v3.1, entity anchors, %Delta shorthand
Preserved	All numeric values, entity-value bindings, causal chains, temporal ordering, currency units
Allowed lossy	Narrative framing, hedging qualifiers beyond approximation marker
Must be exact	Dollar amounts, percentages, entity names, direction of change
Cold-read score	60-70
Target ratio	5-6x

7.3 Rosetta.code

Lossy intermediate representation. AST-derived.

Property	Value
Glyph palette	fn/cl/lp/cd/rt/im/as/er/tp primitives, type shorthand (s/i/f/b/L/D/O), semicolon statements, -> returns, ^lang tags
Preserved	Identifiers, string literals, numeric constants, control flow, type annotations, operation sequence
Allowed lossy	Comments, whitespace, style, binding semantics, constructor internals
Must be exact	Nothing formally guaranteed (empirical boundary)
Cold-read score	40-50
Target ratio	1.5-2x

7.4 Rosetta.legal

Zero-lossy mode. Exact quotation preservation.

Property	Value
Glyph palette	Prose base + span markers `<<` `>>` for exact quotes, provenance chain operator `@src:`, section anchors `SS.N`
Preserved	ALL content. Exact wording of quoted material. Provenance chains. Section references. Party identity.
Allowed lossy	Nothing. Legal mode is zero-lossy by definition.
Must be exact	Quoted text within span markers, party names, section numbers, dates, obligations
Cold-read score	80-85 (low compression = high readability)
Target ratio	1.5-2x

7.5 Rosetta.medical

Diagnosis chains, standardized coding, confidence-critical.

Property	Value
Glyph palette	ICD/SNOMED code anchors `[ICD:XX.X]` `[SNO:XXXXX]`, Dx(diagnosis)/Rx(prescription)/Sx(symptom)/Hx(history) primitives, confidence-mandatory `^c:` on all clinical assertions
Preserved	Diagnosis codes, medication names and dosages, temporal ordering of clinical events, confidence levels
Allowed lossy	Narrative framing of clinical notes, administrative boilerplate
Must be exact	Drug names, dosages, ICD/SNOMED codes, allergy flags, patient identifiers
Cold-read score	55-65
Target ratio	3-4x

7.6 Rosetta.research

Citation preservation, hypothesis-evidence separation.

Property	Value
Glyph palette	Claim IDs `[C:N]`, evidence tags `[E:N]`, hypothesis markers `H:`, citation anchors `@cite:KEY`, support/contradict operators `+ev`/`-ev`
Preserved	All citations, claim-evidence linkages, hypothesis identity, statistical values, methodology references
Allowed lossy	Literature review narrative, verbose methodology description
Must be exact	Citation keys, statistical values (p-values, confidence intervals, sample sizes), claim IDs
Cold-read score	60-70
Target ratio	3-5x

8. Routing: How Domain Detection Works

The kernel router selects a Rosetta module through a two-stage process:

Stage 1: Explicit flag. If the operator sets ^rosetta:MODULE in META, that module is used. No detection runs. This is the recommended path for production systems.

Stage 2: Automatic detection. If no ^rosetta flag is present, the router applies lightweight heuristics to select a module:

Signal	Detected module
Currency symbols ($, EUR, JPY), >30% numeric density	financial
^mode:code or ^lang:XX present	code
Legal citation patterns (Section X, Article Y, "pursuant to")	legal
ICD/SNOMED codes, drug names, clinical abbreviations	medical
Citation patterns (et al., DOI, arXiv), hypothesis language	research
None of the above	prose (default)

Heuristics are advisory. They must be conservative -- when uncertain, fall back to prose. False routing to a specialized module is worse than undercompressing with prose, because it risks misinterpretation.

The router MUST tag the selected module in the output packet's META field. Even auto-detected modules produce explicit ^rosetta:XX tags. The decompressor never guesses.

9. Worked Example

Source text

Acme Corp reported Q4 revenue of $5.4M, up approximately 27% from $4.2M in Q3, driven by improved customer acquisition. The board approved a 15% budget increase for marketing.

Compressed with Rosetta.prose (default)

PKT[4|INF|Co.Rv|rpt↑|≈%Δ27 Q4|$4.2M→$5.4M ∵Hu.acq↑ ^rosetta:prose]
PKT[4|ACT|Mk.budget|upd↑|∴%Δ15|^ref:1 ^s:board ^rosetta:prose]

Characters: 128. Ratio: 1.8x on this sample. Cold-read: high. A model with no AXL knowledge can largely parse this.

Compressed with Rosetta.financial

PKT[4|INF|Co.Rv|rpt↑|金Q4:acme[$5.4M]⟹%Δ27←Q3[$4.2M]|∵acq↑ ^rosetta:financial]
PKT[4|ACT|Mk.budget|set|金%Δ+15|⟹^ref:1 ^s:board ^rosetta:financial]

Characters: 140. Ratio: 1.7x on this sample. But: every numeric value is structurally anchored in a numeric bundle. On a 50-packet financial report, the structural regularity of 金-anchored bundles yields 5-6x aggregate compression while preserving 100% of numeric facts. The per-packet overhead amortizes across repetitive financial data.

Key difference

Prose mode embeds $4.2M→$5.4M as a range in ARG2 -- readable but positionally fragile in long sequences. Financial mode anchors every value in a typed bundle (金Q4:acme[$5.4M]) that a financial decompressor can index, validate, and reconstruct without positional ambiguity.

10. Migration Path: v3/v4 to v4.1

Phase 1: Flag recognition (non-breaking)

Add ^rosetta to the known META flags list. Parsers that see it treat it as informational. Compression continues to use prose vocabulary. All existing v3/v4 packets are valid. No behavioral change.

Phase 2: Module registration

Define and publish Rosetta.prose (already the v3/v4 base) and Rosetta.financial as the first two modules. Encoders may begin emitting ^rosetta:financial on financial content.

Phase 3: Router activation

Enable automatic detection. Encoders emit ^rosetta:XX on all packets. Decoders use the tag to select module-specific decompression rules.

Phase 4: Module expansion

Publish Rosetta.code (absorbs current ^mode:code), Rosetta.legal, Rosetta.medical, Rosetta.research. Each module ships independently when its fidelity contract and cold-read benchmarks are validated.

Backward compatibility guarantee

At every phase: - Packets without ^rosetta parse and decompress as v4 prose (no change). - Unknown ^rosetta:XX values are ignored by older parsers (standard META flag behavior). - The kernel packet format does not change. Ever.

11. Open Questions

Module versioning. Should modules carry independent version numbers (e.g., ^rosetta:financial.2) or rely on the packet VER field?
Module composition. Can a packet reference two modules (^rosetta:financial+legal)? Current design says no -- one module per packet. Mixed content uses multiple packets.
Custom module registry. How are ^rosetta:org.acme.billing style custom modules discovered and distributed?
Cold-read benchmarks. What is the standard test harness for measuring cold-read scores across modules? Needs a reference corpus per domain.
Router confidence. Should the auto-detection stage emit a confidence score (^rc:0.85) so downstream systems can decide whether to trust the routing decision?

Source & Raw Text

Repos are PRIVATE. Public visitors will see HTTP 404 unless authenticated.