# AXL Rosetta v4.0.1 - Public Release Kernel Router
# Frozen at: v4.0.2-r6-freeze (commit 51e75de)
# Released: 2026-04-25
# Source: github.com/dcarranza-axl/axl-research/blob/v4.0.2-r6-freeze/spec/v4-kernel-router.md
# Mirror: github.com/axlprotocol/axl-research/blob/v4.0.2-r6-freeze/spec/v4-kernel-router.md
# License: Apache 2.0

# AXL Rosetta v4 Kernel Router Architecture

Version: 4.1.0-blueprint
Status: Blueprint (pre-draft)
License: Apache 2.0
Extends: v4 Kernel (spec/v4-kernel.md)

## 1. Problem Statement

AXL Rosetta v3/v4 uses a single compression vocabulary for all content types.
The `^mode` flag (gist, qa, audit, legal, code, research, plan) affects
decompression fidelity requirements but not compression strategy. This means
financial data is compressed with the same glyphs as legal text, and medical
records use the same vocabulary as prose summaries.

The result: domain-specific content either undercompresses (safe but wasteful)
or loses critical domain signals (compact but unfaithful).

## 2. Key Insight

Instead of one universal compression vocabulary, AXL needs a **kernel router**
that detects content domain and dispatches to a specialized **Rosetta module**.
Each module carries its own compression vocabulary, fidelity contract, and
decompression rules -- optimized for its domain.

The kernel itself does not change. The packet format is v3/v4 forever. The
router is a single new META flag: `^rosetta:module`.

## 3. Architecture

```
                          AXL v4 Kernel Router
                          ====================

  Source Content
       |
       v
  +--------------------+
  | KERNEL ROUTER      |    Tiny. Stateless. One job:
  | detect domain OR   |    read ^rosetta flag, or infer
  | read explicit flag |    domain from content signals.
  +--------------------+
       |
       +---> ^rosetta:prose      ---> [ Rosetta.prose      ] --+
       +---> ^rosetta:financial  ---> [ Rosetta.financial   ] --+
       +---> ^rosetta:code       ---> [ Rosetta.code        ] --+
       +---> ^rosetta:legal      ---> [ Rosetta.legal       ] --+
       +---> ^rosetta:medical    ---> [ Rosetta.medical     ] --+
       +---> ^rosetta:research   ---> [ Rosetta.research    ] --+
       |                                                        |
       v                                                        v
  Same PKT[VER|CLS|SUB|TAG|ARG1|ARG2|META] format       Compressed packets
  Same version negotiation                               with ^rosetta:XX
  Same error handling                                    in META field
       |                                                        |
       |                         DECOMPRESSION                  |
       |                              |                         |
       |                              v                         |
       |                    ^rosetta tag tells decoder           |
       |                    which module vocabulary to           |
       +<----------- use for faithful reconstruction -----------+
```

## 4. Design Principles

1. **The kernel NEVER changes.** PKT format, field grammar, class codes,
   version negotiation, error handling -- all unchanged from v4-kernel.md.

2. **Rosetta modules are pluggable.** Adding a new domain means registering
   a new module name. No kernel changes required.

3. **Backward compatible.** Packets without `^rosetta` use prose mode, which
   is the v3 base vocabulary. A v4.0 parser sees `^rosetta:financial` as an
   unknown META flag and ignores it -- no parse failure.

4. **Explicit over automatic.** Operators may set `^rosetta:financial`
   directly. The router may also detect domain automatically. Explicit
   always wins.

5. **One module per packet.** A single packet uses exactly one Rosetta
   vocabulary. Mixed-domain messages use multiple packets with different
   `^rosetta` tags.

## 5. The ^rosetta META Flag

Syntax: `^rosetta:MODULE_NAME`

Position: META field of any packet, alongside existing flags.

```
PKT[4|INF|Co.Rv|rpt|...|...|^rosetta:financial ^t:260410]
```

Reserved module names: `prose`, `financial`, `code`, `legal`, `medical`,
`research`. Custom modules use dotted names: `^rosetta:org.acme.billing`.

Default when absent: `prose`.

### Interaction with ^mode

`^mode` and `^rosetta` are orthogonal:

| Flag | Controls | Example |
|------|----------|---------|
| `^mode` | Decompression fidelity tier | `^mode:audit` = zero-lossy decompression |
| `^rosetta` | Compression vocabulary | `^rosetta:financial` = financial glyphs |

They compose: `^mode:audit ^rosetta:financial` means "compress with financial
vocabulary, decompress at audit fidelity (zero-lossy)." This replaces the
v3 pattern where `^mode:legal` had to imply both fidelity AND vocabulary.

## 6. Rosetta Module Interface

Every module must specify four components:

```
MODULE INTERFACE
================
module_name   : string        -- registered name (e.g. "financial")
glyph_palette : table         -- domain-specific glyphs and primitives
fidelity      : contract      -- what is preserved, what is lossy
decomp_rules  : grammar       -- decompression production rules
cold_read     : score 0-100   -- expected recovery % on a cold lowest-tier model
```

### 6.1 Glyph Palette

Each module defines its own compression vocabulary. Glyphs are drawn from
Unicode BMP and must not conflict with kernel reserved characters
(`| [ ] \ ^ ;`).

Modules inherit the kernel base vocabulary (class codes, base TAGs, numeric
shorthand) and extend it with domain-specific primitives.

### 6.2 Fidelity Contract

Four-column format from v4-layer-classification.md:

| Column | Meaning |
|--------|---------|
| **Gained** | Character reduction from this module vs prose baseline |
| **Preserved** | Semantic elements that survive round-trip exactly |
| **Allowed lossy** | Elements that may be paraphrased or dropped |
| **Must be exact** | Elements where any alteration is a protocol error |

### 6.3 Decompression Rules

Production rules mapping module glyphs back to natural language. A decoder
that knows the module can reconstruct faithfully. A decoder that does not
know the module falls back to prose-mode (best-effort).

### 6.4 Cold-Read Score

Integer 0-100 representing expected semantic recovery when a lowest-tier
model (no AXL training, no spec in context) attempts to read the compressed
output. Higher is better. Prose is the ceiling.

## 7. Proposed Modules

### 7.1 Rosetta.prose

The v3 base. Minimal glyphs. Highest cold-read score.

| Property | Value |
|----------|-------|
| Glyph palette | v3 base + v3.1 math operators + v3.2 ideographic composition |
| Preserved | Narrative structure, causal chains, entity identity, numeric values |
| Allowed lossy | Verbose phrasing, hedging language, stylistic flourishes |
| Cold-read score | 85-90 |
| Target ratio | 3-4x |

### 7.2 Rosetta.financial

Heavy use of CJK magnitude glyphs, numeric bundles, entity anchors.

| Property | Value |
|----------|-------|
| Glyph palette | 金(monetary), 高/大/小(magnitude), ⟹(causal chain), numeric bundles from v3.1, entity anchors, %Delta shorthand |
| Preserved | All numeric values, entity-value bindings, causal chains, temporal ordering, currency units |
| Allowed lossy | Narrative framing, hedging qualifiers beyond approximation marker |
| Must be exact | Dollar amounts, percentages, entity names, direction of change |
| Cold-read score | 60-70 |
| Target ratio | 5-6x |

### 7.3 Rosetta.code

Lossy intermediate representation. AST-derived.

| Property | Value |
|----------|-------|
| Glyph palette | fn/cl/lp/cd/rt/im/as/er/tp primitives, type shorthand (s/i/f/b/L/D/O), semicolon statements, -> returns, ^lang tags |
| Preserved | Identifiers, string literals, numeric constants, control flow, type annotations, operation sequence |
| Allowed lossy | Comments, whitespace, style, binding semantics, constructor internals |
| Must be exact | Nothing formally guaranteed (empirical boundary) |
| Cold-read score | 40-50 |
| Target ratio | 1.5-2x |

### 7.4 Rosetta.legal

Zero-lossy mode. Exact quotation preservation.

| Property | Value |
|----------|-------|
| Glyph palette | Prose base + span markers `<<` `>>` for exact quotes, provenance chain operator `@src:`, section anchors `SS.N` |
| Preserved | ALL content. Exact wording of quoted material. Provenance chains. Section references. Party identity. |
| Allowed lossy | Nothing. Legal mode is zero-lossy by definition. |
| Must be exact | Quoted text within span markers, party names, section numbers, dates, obligations |
| Cold-read score | 80-85 (low compression = high readability) |
| Target ratio | 1.5-2x |

### 7.5 Rosetta.medical

Diagnosis chains, standardized coding, confidence-critical.

| Property | Value |
|----------|-------|
| Glyph palette | ICD/SNOMED code anchors `[ICD:XX.X]` `[SNO:XXXXX]`, Dx(diagnosis)/Rx(prescription)/Sx(symptom)/Hx(history) primitives, confidence-mandatory `^c:` on all clinical assertions |
| Preserved | Diagnosis codes, medication names and dosages, temporal ordering of clinical events, confidence levels |
| Allowed lossy | Narrative framing of clinical notes, administrative boilerplate |
| Must be exact | Drug names, dosages, ICD/SNOMED codes, allergy flags, patient identifiers |
| Cold-read score | 55-65 |
| Target ratio | 3-4x |

### 7.6 Rosetta.research

Citation preservation, hypothesis-evidence separation.

| Property | Value |
|----------|-------|
| Glyph palette | Claim IDs `[C:N]`, evidence tags `[E:N]`, hypothesis markers `H:`, citation anchors `@cite:KEY`, support/contradict operators `+ev`/`-ev` |
| Preserved | All citations, claim-evidence linkages, hypothesis identity, statistical values, methodology references |
| Allowed lossy | Literature review narrative, verbose methodology description |
| Must be exact | Citation keys, statistical values (p-values, confidence intervals, sample sizes), claim IDs |
| Cold-read score | 60-70 |
| Target ratio | 3-5x |

## 8. Routing: How Domain Detection Works

The kernel router selects a Rosetta module through a two-stage process:

**Stage 1: Explicit flag.** If the operator sets `^rosetta:MODULE` in META,
that module is used. No detection runs. This is the recommended path for
production systems.

**Stage 2: Automatic detection.** If no `^rosetta` flag is present, the
router applies lightweight heuristics to select a module:

| Signal | Detected module |
|--------|----------------|
| Currency symbols ($, EUR, JPY), >30% numeric density | financial |
| ^mode:code or ^lang:XX present | code |
| Legal citation patterns (Section X, Article Y, "pursuant to") | legal |
| ICD/SNOMED codes, drug names, clinical abbreviations | medical |
| Citation patterns (et al., DOI, arXiv), hypothesis language | research |
| None of the above | prose (default) |

Heuristics are advisory. They must be conservative -- when uncertain, fall
back to prose. False routing to a specialized module is worse than
undercompressing with prose, because it risks misinterpretation.

**The router MUST tag the selected module in the output packet's META field.**
Even auto-detected modules produce explicit `^rosetta:XX` tags. The
decompressor never guesses.

## 9. Worked Example

### Source text

> Acme Corp reported Q4 revenue of $5.4M, up approximately 27% from $4.2M in
> Q3, driven by improved customer acquisition. The board approved a 15% budget
> increase for marketing.

### Compressed with Rosetta.prose (default)

```
PKT[4|INF|Co.Rv|rpt↑|≈%Δ27 Q4|$4.2M→$5.4M ∵Hu.acq↑ ^rosetta:prose]
PKT[4|ACT|Mk.budget|upd↑|∴%Δ15|^ref:1 ^s:board ^rosetta:prose]
```

Characters: 128. Ratio: 1.8x on this sample.
Cold-read: high. A model with no AXL knowledge can largely parse this.

### Compressed with Rosetta.financial

```
PKT[4|INF|Co.Rv|rpt↑|金Q4:acme[$5.4M]⟹%Δ27←Q3[$4.2M]|∵acq↑ ^rosetta:financial]
PKT[4|ACT|Mk.budget|set|金%Δ+15|⟹^ref:1 ^s:board ^rosetta:financial]
```

Characters: 140. Ratio: 1.7x on this sample.
But: every numeric value is structurally anchored in a numeric bundle.
On a 50-packet financial report, the structural regularity of 金-anchored
bundles yields 5-6x aggregate compression while preserving 100% of numeric
facts. The per-packet overhead amortizes across repetitive financial data.

### Key difference

Prose mode embeds `$4.2M→$5.4M` as a range in ARG2 -- readable but
positionally fragile in long sequences. Financial mode anchors every value
in a typed bundle (`金Q4:acme[$5.4M]`) that a financial decompressor can
index, validate, and reconstruct without positional ambiguity.

## 10. Migration Path: v3/v4 to v4.1

### Phase 1: Flag recognition (non-breaking)

Add `^rosetta` to the known META flags list. Parsers that see it treat
it as informational. Compression continues to use prose vocabulary.
All existing v3/v4 packets are valid. No behavioral change.

### Phase 2: Module registration

Define and publish Rosetta.prose (already the v3/v4 base) and
Rosetta.financial as the first two modules. Encoders may begin
emitting `^rosetta:financial` on financial content.

### Phase 3: Router activation

Enable automatic detection. Encoders emit `^rosetta:XX` on all
packets. Decoders use the tag to select module-specific
decompression rules.

### Phase 4: Module expansion

Publish Rosetta.code (absorbs current ^mode:code), Rosetta.legal,
Rosetta.medical, Rosetta.research. Each module ships independently
when its fidelity contract and cold-read benchmarks are validated.

### Backward compatibility guarantee

At every phase:
- Packets without `^rosetta` parse and decompress as v4 prose (no change).
- Unknown `^rosetta:XX` values are ignored by older parsers (standard
  META flag behavior).
- The kernel packet format does not change. Ever.

## 11. Open Questions

1. **Module versioning.** Should modules carry independent version numbers
   (e.g., `^rosetta:financial.2`) or rely on the packet VER field?

2. **Module composition.** Can a packet reference two modules
   (`^rosetta:financial+legal`)? Current design says no -- one module per
   packet. Mixed content uses multiple packets.

3. **Custom module registry.** How are `^rosetta:org.acme.billing` style
   custom modules discovered and distributed?

4. **Cold-read benchmarks.** What is the standard test harness for measuring
   cold-read scores across modules? Needs a reference corpus per domain.

5. **Router confidence.** Should the auto-detection stage emit a confidence
   score (`^rc:0.85`) so downstream systems can decide whether to trust
   the routing decision?
