v4.0.1 Kernel Router
Pluggable Rosetta modules. One META flag dispatches.
The kernel router is stateless. It reads the ^rosetta:MODULE flag and dispatches to a domain-aware compression module. Each module carries its own glyph palette, fidelity contract, and decompression rules. The kernel itself never changes.
AXL Rosetta v4 Kernel Router Architecture
Version: 4.1.0-blueprint Status: Blueprint (pre-draft) License: Apache 2.0 Extends: v4 Kernel (spec/v4-kernel.md)
1. Problem Statement
AXL Rosetta v3/v4 uses a single compression vocabulary for all content types.
The ^mode flag (gist, qa, audit, legal, code, research, plan) affects
decompression fidelity requirements but not compression strategy. This means
financial data is compressed with the same glyphs as legal text, and medical
records use the same vocabulary as prose summaries.
The result: domain-specific content either undercompresses (safe but wasteful) or loses critical domain signals (compact but unfaithful).
2. Key Insight
Instead of one universal compression vocabulary, AXL needs a kernel router that detects content domain and dispatches to a specialized Rosetta module. Each module carries its own compression vocabulary, fidelity contract, and decompression rules -- optimized for its domain.
The kernel itself does not change. The packet format is v3/v4 forever. The
router is a single new META flag: ^rosetta:module.
The spec describes a router that dispatches to six reserved modules: prose, financial, code, legal, medical, research (see Section 5 and Section 7 below).
As of v4.0.2-r6-freeze (commit 51e75de), the kernel router and three concrete Rosetta modules are implemented:
prose- the v3 base vocabulary; default and fallbackfinancial- CJK magnitude glyphs, numeric bundles, entity anchorsconstruction- extension module beyond the spec; technical-spec / RFI / submittal vocabulary
Four spec-listed modules remain blueprint-only and are not yet implemented: code, legal, medical, research. Operators who set ^rosetta:legal, ^rosetta:medical, ^rosetta:research, or ^rosetta:code on a packet will be served the prose fallback by the registry. Module names that are not registered route to prose by design (see Section 10, Backward compatibility guarantee). The construction module is in the registry but is not in the spec's reserved-name list, so it is currently a local extension pending spec uplift.
This is intentional. Spec ahead of implementation is the working order: each new module ships independently when its fidelity contract and cold-read benchmarks pass (see Section 10, Phase 4). Module source: src/axl_v4/rosetta/{base,prose,financial,construction}.py at github.com/dcarranza-axl/axl-research/tree/v4.0.2-r6-freeze/src/axl_v4/rosetta (repo is private; public visitors will see HTTP 404 unless authenticated).
3. Architecture
AXL v4 Kernel Router
====================
Source Content
|
v
+--------------------+
| KERNEL ROUTER | Tiny. Stateless. One job:
| detect domain OR | read ^rosetta flag, or infer
| read explicit flag | domain from content signals.
+--------------------+
|
+---> ^rosetta:prose ---> [ Rosetta.prose ] --+
+---> ^rosetta:financial ---> [ Rosetta.financial ] --+
+---> ^rosetta:code ---> [ Rosetta.code ] --+
+---> ^rosetta:legal ---> [ Rosetta.legal ] --+
+---> ^rosetta:medical ---> [ Rosetta.medical ] --+
+---> ^rosetta:research ---> [ Rosetta.research ] --+
| |
v v
Same PKT[VER|CLS|SUB|TAG|ARG1|ARG2|META] format Compressed packets
Same version negotiation with ^rosetta:XX
Same error handling in META field
| |
| DECOMPRESSION |
| | |
| v |
| ^rosetta tag tells decoder |
| which module vocabulary to |
+<----------- use for faithful reconstruction -----------+
4. Design Principles
-
The kernel NEVER changes. PKT format, field grammar, class codes, version negotiation, error handling -- all unchanged from v4-kernel.md.
-
Rosetta modules are pluggable. Adding a new domain means registering a new module name. No kernel changes required.
-
Backward compatible. Packets without
^rosettause prose mode, which is the v3 base vocabulary. A v4.0 parser sees^rosetta:financialas an unknown META flag and ignores it -- no parse failure. -
Explicit over automatic. Operators may set
^rosetta:financialdirectly. The router may also detect domain automatically. Explicit always wins. -
One module per packet. A single packet uses exactly one Rosetta vocabulary. Mixed-domain messages use multiple packets with different
^rosettatags.
5. The ^rosetta META Flag
Syntax: ^rosetta:MODULE_NAME
Position: META field of any packet, alongside existing flags.
PKT[4|INF|Co.Rv|rpt|...|...|^rosetta:financial ^t:260410]
Reserved module names: prose, financial, code, legal, medical,
research. Custom modules use dotted names: ^rosetta:org.acme.billing.
Default when absent: prose.
Interaction with ^mode
^mode and ^rosetta are orthogonal:
| Flag | Controls | Example |
|---|---|---|
^mode |
Decompression fidelity tier | ^mode:audit = zero-lossy decompression |
^rosetta |
Compression vocabulary | ^rosetta:financial = financial glyphs |
They compose: ^mode:audit ^rosetta:financial means "compress with financial
vocabulary, decompress at audit fidelity (zero-lossy)." This replaces the
v3 pattern where ^mode:legal had to imply both fidelity AND vocabulary.
6. Rosetta Module Interface
Every module must specify four components:
MODULE INTERFACE
================
module_name : string -- registered name (e.g. "financial")
glyph_palette : table -- domain-specific glyphs and primitives
fidelity : contract -- what is preserved, what is lossy
decomp_rules : grammar -- decompression production rules
cold_read : score 0-100 -- expected recovery % on a cold lowest-tier model
6.1 Glyph Palette
Each module defines its own compression vocabulary. Glyphs are drawn from
Unicode BMP and must not conflict with kernel reserved characters
(| [ ] \ ^ ;).
Modules inherit the kernel base vocabulary (class codes, base TAGs, numeric shorthand) and extend it with domain-specific primitives.
6.2 Fidelity Contract
Four-column format from v4-layer-classification.md:
| Column | Meaning |
|---|---|
| Gained | Character reduction from this module vs prose baseline |
| Preserved | Semantic elements that survive round-trip exactly |
| Allowed lossy | Elements that may be paraphrased or dropped |
| Must be exact | Elements where any alteration is a protocol error |
6.3 Decompression Rules
Production rules mapping module glyphs back to natural language. A decoder that knows the module can reconstruct faithfully. A decoder that does not know the module falls back to prose-mode (best-effort).
6.4 Cold-Read Score
Integer 0-100 representing expected semantic recovery when a lowest-tier model (no AXL training, no spec in context) attempts to read the compressed output. Higher is better. Prose is the ceiling.
7. Proposed Modules
7.1 Rosetta.prose
The v3 base. Minimal glyphs. Highest cold-read score.
| Property | Value |
|---|---|
| Glyph palette | v3 base + v3.1 math operators + v3.2 ideographic composition |
| Preserved | Narrative structure, causal chains, entity identity, numeric values |
| Allowed lossy | Verbose phrasing, hedging language, stylistic flourishes |
| Cold-read score | 85-90 |
| Target ratio | 3-4x |
7.2 Rosetta.financial
Heavy use of CJK magnitude glyphs, numeric bundles, entity anchors.
| Property | Value |
|---|---|
| Glyph palette | 金(monetary), 高/大/小(magnitude), ⟹(causal chain), numeric bundles from v3.1, entity anchors, %Delta shorthand |
| Preserved | All numeric values, entity-value bindings, causal chains, temporal ordering, currency units |
| Allowed lossy | Narrative framing, hedging qualifiers beyond approximation marker |
| Must be exact | Dollar amounts, percentages, entity names, direction of change |
| Cold-read score | 60-70 |
| Target ratio | 5-6x |
7.3 Rosetta.code
Lossy intermediate representation. AST-derived.
| Property | Value |
|---|---|
| Glyph palette | fn/cl/lp/cd/rt/im/as/er/tp primitives, type shorthand (s/i/f/b/L/D/O), semicolon statements, -> returns, ^lang tags |
| Preserved | Identifiers, string literals, numeric constants, control flow, type annotations, operation sequence |
| Allowed lossy | Comments, whitespace, style, binding semantics, constructor internals |
| Must be exact | Nothing formally guaranteed (empirical boundary) |
| Cold-read score | 40-50 |
| Target ratio | 1.5-2x |
7.4 Rosetta.legal
Zero-lossy mode. Exact quotation preservation.
| Property | Value |
|---|---|
| Glyph palette | Prose base + span markers << >> for exact quotes, provenance chain operator @src:, section anchors SS.N |
| Preserved | ALL content. Exact wording of quoted material. Provenance chains. Section references. Party identity. |
| Allowed lossy | Nothing. Legal mode is zero-lossy by definition. |
| Must be exact | Quoted text within span markers, party names, section numbers, dates, obligations |
| Cold-read score | 80-85 (low compression = high readability) |
| Target ratio | 1.5-2x |
7.5 Rosetta.medical
Diagnosis chains, standardized coding, confidence-critical.
| Property | Value |
|---|---|
| Glyph palette | ICD/SNOMED code anchors [ICD:XX.X] [SNO:XXXXX], Dx(diagnosis)/Rx(prescription)/Sx(symptom)/Hx(history) primitives, confidence-mandatory ^c: on all clinical assertions |
| Preserved | Diagnosis codes, medication names and dosages, temporal ordering of clinical events, confidence levels |
| Allowed lossy | Narrative framing of clinical notes, administrative boilerplate |
| Must be exact | Drug names, dosages, ICD/SNOMED codes, allergy flags, patient identifiers |
| Cold-read score | 55-65 |
| Target ratio | 3-4x |
7.6 Rosetta.research
Citation preservation, hypothesis-evidence separation.
| Property | Value |
|---|---|
| Glyph palette | Claim IDs [C:N], evidence tags [E:N], hypothesis markers H:, citation anchors @cite:KEY, support/contradict operators +ev/-ev |
| Preserved | All citations, claim-evidence linkages, hypothesis identity, statistical values, methodology references |
| Allowed lossy | Literature review narrative, verbose methodology description |
| Must be exact | Citation keys, statistical values (p-values, confidence intervals, sample sizes), claim IDs |
| Cold-read score | 60-70 |
| Target ratio | 3-5x |
8. Routing: How Domain Detection Works
The kernel router selects a Rosetta module through a two-stage process:
Stage 1: Explicit flag. If the operator sets ^rosetta:MODULE in META,
that module is used. No detection runs. This is the recommended path for
production systems.
Stage 2: Automatic detection. If no ^rosetta flag is present, the
router applies lightweight heuristics to select a module:
| Signal | Detected module |
|---|---|
| Currency symbols ($, EUR, JPY), >30% numeric density | financial |
| ^mode:code or ^lang:XX present | code |
| Legal citation patterns (Section X, Article Y, "pursuant to") | legal |
| ICD/SNOMED codes, drug names, clinical abbreviations | medical |
| Citation patterns (et al., DOI, arXiv), hypothesis language | research |
| None of the above | prose (default) |
Heuristics are advisory. They must be conservative -- when uncertain, fall back to prose. False routing to a specialized module is worse than undercompressing with prose, because it risks misinterpretation.
The router MUST tag the selected module in the output packet's META field.
Even auto-detected modules produce explicit ^rosetta:XX tags. The
decompressor never guesses.
9. Worked Example
Source text
Acme Corp reported Q4 revenue of $5.4M, up approximately 27% from $4.2M in Q3, driven by improved customer acquisition. The board approved a 15% budget increase for marketing.
Compressed with Rosetta.prose (default)
PKT[4|INF|Co.Rv|rpt↑|≈%Δ27 Q4|$4.2M→$5.4M ∵Hu.acq↑ ^rosetta:prose]
PKT[4|ACT|Mk.budget|upd↑|∴%Δ15|^ref:1 ^s:board ^rosetta:prose]
Characters: 128. Ratio: 1.8x on this sample. Cold-read: high. A model with no AXL knowledge can largely parse this.
Compressed with Rosetta.financial
PKT[4|INF|Co.Rv|rpt↑|金Q4:acme[$5.4M]⟹%Δ27←Q3[$4.2M]|∵acq↑ ^rosetta:financial]
PKT[4|ACT|Mk.budget|set|金%Δ+15|⟹^ref:1 ^s:board ^rosetta:financial]
Characters: 140. Ratio: 1.7x on this sample. But: every numeric value is structurally anchored in a numeric bundle. On a 50-packet financial report, the structural regularity of 金-anchored bundles yields 5-6x aggregate compression while preserving 100% of numeric facts. The per-packet overhead amortizes across repetitive financial data.
Key difference
Prose mode embeds $4.2M→$5.4M as a range in ARG2 -- readable but
positionally fragile in long sequences. Financial mode anchors every value
in a typed bundle (金Q4:acme[$5.4M]) that a financial decompressor can
index, validate, and reconstruct without positional ambiguity.
10. Migration Path: v3/v4 to v4.1
Phase 1: Flag recognition (non-breaking)
Add ^rosetta to the known META flags list. Parsers that see it treat
it as informational. Compression continues to use prose vocabulary.
All existing v3/v4 packets are valid. No behavioral change.
Phase 2: Module registration
Define and publish Rosetta.prose (already the v3/v4 base) and
Rosetta.financial as the first two modules. Encoders may begin
emitting ^rosetta:financial on financial content.
Phase 3: Router activation
Enable automatic detection. Encoders emit ^rosetta:XX on all
packets. Decoders use the tag to select module-specific
decompression rules.
Phase 4: Module expansion
Publish Rosetta.code (absorbs current ^mode:code), Rosetta.legal, Rosetta.medical, Rosetta.research. Each module ships independently when its fidelity contract and cold-read benchmarks are validated.
Backward compatibility guarantee
At every phase:
- Packets without ^rosetta parse and decompress as v4 prose (no change).
- Unknown ^rosetta:XX values are ignored by older parsers (standard
META flag behavior).
- The kernel packet format does not change. Ever.
11. Open Questions
-
Module versioning. Should modules carry independent version numbers (e.g.,
^rosetta:financial.2) or rely on the packet VER field? -
Module composition. Can a packet reference two modules (
^rosetta:financial+legal)? Current design says no -- one module per packet. Mixed content uses multiple packets. -
Custom module registry. How are
^rosetta:org.acme.billingstyle custom modules discovered and distributed? -
Cold-read benchmarks. What is the standard test harness for measuring cold-read scores across modules? Needs a reference corpus per domain.
-
Router confidence. Should the auto-detection stage emit a confidence score (
^rc:0.85) so downstream systems can decide whether to trust the routing decision?
Source & Raw Text
Repos are PRIVATE. Public visitors will see HTTP 404 unless authenticated.