AXL / ROSETTA / V4.0.1 / SPEC / CODE LAYER

v4.0.1 Code Layer

Lossy structural IR. Intent over byte-identity.

The v4 code layer compresses programming constructs and source-code fragments into the same PKT structure used for natural language. Code packets carry ^mode:code and ^lang:XX. Decompressed output preserves intent, structure, and identifiers but is not guaranteed to execute identically.

AXL Rosetta v4: Code Compression Layer

Version: 4.0.0-draft Status: Draft License: Apache 2.0 Extends: v3.2 Ideographic Composition (spec/v3.2-ideographic.md)

1. Overview

The v4 code layer extends AXL Rosetta to compress programming constructs, source code fragments, and executable commands into the same PKT structure used for natural language. Code packets carry a ^mode:code meta flag and a ^lang: tag to identify the source language. Code compression is a lossy structural intermediate representation. Decompressed output preserves intent, structure, and identifiers but is not guaranteed to execute identically to the original. Practical round-trip fidelity varies by construct complexity and language. This is compression for communication, not archival.

2. Code Primitives

Primitive Meaning Maps To
fn Function definition def, function, func
cl Class definition class, struct
lp Loop construct for, while, loop
cd Conditional if, elif, else, switch, match
rt Return statement return, yield
im Import/include import, require, include, use
as Assignment =, :=, let, const, var
er Error handling try, catch, except, finally
tp Type annotation int, str, list, dict, custom

3. Packet Format for Code

Code packets use the standard PKT structure with mandatory meta flags:

PKT[4|CLS|SUB|TAG|ARG1|ARG2|^mode:code ^lang:XX]
  • ^mode:code is required on every code packet.
  • ^lang:XX identifies the language using short tags (see Section 4).
  • CLS is typically ACT for definitions and INF for descriptions of code.
  • SUB identifies the module, class, or function being described.
  • TAG uses code primitives from Section 2.
  • ARG1 carries the name/signature.
  • ARG2 carries the body or logic in compressed notation.

Code Body Compression Rules

  1. Semicolon separation: Multiple statements compress to semicolon-delimited sequences.
  2. Arrow notation for returns: -> replaces explicit return for single-expression functions.
  3. Brace elision: Indentation-based languages (Python) omit braces. Brace languages compress to {...} with semicolon-separated interior.
  4. Type shorthand: Common types abbreviate: s=str, i=int, f=float, b=bool, L=list, D=dict, O=optional.
  5. Implicit self/this: First parameter self or this is omitted in compressed form.

4. Language Tags

Tag Language
py Python
js JavaScript
ts TypeScript
sql SQL
sh Shell/Bash
go Go
rs Rust
rb Ruby
java Java
api REST/HTTP API

5. Code Compression Examples

Example 1: Python Function

Original (147 chars):

def calculate_discount(price: float, rate: float = 0.1) -> float:
    if rate > 0.5:
        raise ValueError("Rate too high")
    return price * (1 - rate)

Compressed (82 chars):

PKT[4|ACT|calc|fn|calculate_discount(f:price,f:rate=0.1)->f|cd rate>0.5:er "Rate too high";rt price*(1-rate)|^mode:code ^lang:py]

Savings: 65 chars (44.2%)

Example 2: JavaScript Class

Original (198 chars):

class UserCache {
  constructor(maxSize = 100) {
    this.cache = new Map();
    this.maxSize = maxSize;
  }
  get(key) {
    return this.cache.get(key);
  }
  set(key, value) {
    this.cache.set(key, value);
  }
}

Compressed (112 chars):

PKT[4|ACT|UserCache|cl|constructor(i:maxSize=100)|as cache=Map();as maxSize=maxSize|^mode:code ^lang:js]
PKT[4|ACT|UserCache.get|fn|key|->cache.get(key)|^mode:code ^lang:js]
PKT[4|ACT|UserCache.set|fn|key,value|cache.set(key,value)|^mode:code ^lang:js]

Savings: 86 chars (43.4%) Note: The class packet is lossy IR: constructor binding and this context are compressed away. Individual methods mark as lossless where the round-trip is exact.

Example 3: SQL Query

Original (186 chars):

SELECT u.name, u.email, COUNT(o.id) AS order_count
FROM users u
JOIN orders o ON u.id = o.user_id
WHERE o.created_at >= '2025-01-01'
GROUP BY u.name, u.email
HAVING COUNT(o.id) > 5
ORDER BY order_count DESC;

Compressed (108 chars):

PKT[4|QRY|users+orders|qry|u.name,u.email,COUNT(o.id):order_count|JOIN o ON u.id=o.user_id;WHERE o.created_at>=250101;GROUP u.name,u.email;HAVING COUNT(o.id)>5;ORDER order_count DESC|^mode:code ^lang:sql]

Savings: 78 chars (41.9%)

Example 4: Shell Script

Original (142 chars):

#!/bin/bash
for file in /var/log/*.log; do
    if [ -f "$file" ] && [ $(wc -l < "$file") -gt 1000 ]; then
        gzip "$file"
    fi
done

Compressed (86 chars):

PKT[4|ACT|log.compress|lp|file:/var/log/*.log|cd -f file AND wc -l file>1000:gzip file|^mode:code ^lang:sh]

Savings: 56 chars (39.4%) Note: This is lossy IR. Shell quoting, command substitution syntax, and redirection operators are not preserved. The compressed form captures the structural intent (loop, filter, compress) but decompressed output requires shell-specific reconstruction. Round-trip execution fidelity is not guaranteed for shell.

Example 5: REST API Endpoint

Original (163 chars):

POST /api/v2/users
Content-Type: application/json
Authorization: Bearer {token}

{"name": "string", "email": "string", "role": "admin|user"}

Response: 201 Created

Compressed (92 chars):

PKT[4|ACT|api.users|crt|POST /v2/users|body:D{s:name,s:email,s:role=admin+user};auth:bearer;rsp:201|^mode:code ^lang:api]

Savings: 71 chars (43.6%) Note: The literal pipe in admin|user is replaced with + (union notation) because pipe is the field delimiter and is prohibited in free-text fields. This is a lossy transformation, the original enum syntax is not preserved.

Summary

Example Before (chars) After (chars) Savings
Python fn 147 82 44.2%
JS class 198 112 43.4%
SQL query 186 108 41.9%
Shell loop 142 86 39.4%
REST API 163 89 45.4%
Mean 167.2 95.4 42.9%

6. Fidelity Contract

Code compression is a lossy structural intermediate representation. The contract:

  1. Preserved: identifier names, string literals, numeric constants, control flow structure, operation sequence, data shape, type annotations where present.
  2. Allowed lossy: comments, whitespace, coding style, import ordering, binding semantics (this/self dispatch), constructor internals, language-specific syntax (shell quoting, command substitution, method resolution order).
  3. Not guaranteed: identical execution behavior. Decompressed output may require manual adjustment to compile or run in the target language.

The boundary between "survives round-trip" and "requires adjustment" is empirical, not formally defined. Simple constructs (single-expression functions, basic assignments) tend to round-trip cleanly. Complex constructs (classes with inheritance, shell pipelines, cross-language API specs) do not. The spec does not promise where this boundary lies because the grammar cannot enforce it.

This is compression for communication: an agent reading the compressed form should understand what the code does. It is not compression for archival: you cannot reconstruct the original source from the compressed form alone.

Comment Handling

Comments and docstrings are lossy by default. A ^keep:docs meta flag requests comment preservation, but this is advisory, not guaranteed.

7. Backward Compatibility

  • Code packets are valid v4 packets. A v3.x parser will parse the pipe-delimited fields correctly but will not understand the code body notation in ARG2.
  • The ^mode:code flag allows parsers to skip code packets they cannot process.
  • Without ^mode:code, packets are treated as natural language under v3.x rules.
  • Language-unaware parsers treat ^lang:XX as an opaque meta flag.

Source & Raw Text

Repos are PRIVATE. Public visitors will see HTTP 404 unless authenticated.