v4.0.1 Code Layer
Lossy structural IR. Intent over byte-identity.
The v4 code layer compresses programming constructs and source-code fragments into the same PKT structure used for natural language. Code packets carry ^mode:code and ^lang:XX. Decompressed output preserves intent, structure, and identifiers but is not guaranteed to execute identically.
AXL Rosetta v4: Code Compression Layer
Version: 4.0.0-draft Status: Draft License: Apache 2.0 Extends: v3.2 Ideographic Composition (spec/v3.2-ideographic.md)
1. Overview
The v4 code layer extends AXL Rosetta to compress programming constructs, source code
fragments, and executable commands into the same PKT structure used for natural language.
Code packets carry a ^mode:code meta flag and a ^lang: tag to identify the source
language. Code compression is a lossy structural intermediate representation.
Decompressed output preserves intent, structure, and identifiers but is not
guaranteed to execute identically to the original. Practical round-trip fidelity
varies by construct complexity and language. This is compression for communication,
not archival.
2. Code Primitives
| Primitive | Meaning | Maps To |
|---|---|---|
| fn | Function definition | def, function, func |
| cl | Class definition | class, struct |
| lp | Loop construct | for, while, loop |
| cd | Conditional | if, elif, else, switch, match |
| rt | Return statement | return, yield |
| im | Import/include | import, require, include, use |
| as | Assignment | =, :=, let, const, var |
| er | Error handling | try, catch, except, finally |
| tp | Type annotation | int, str, list, dict, custom |
3. Packet Format for Code
Code packets use the standard PKT structure with mandatory meta flags:
PKT[4|CLS|SUB|TAG|ARG1|ARG2|^mode:code ^lang:XX]
^mode:codeis required on every code packet.^lang:XXidentifies the language using short tags (see Section 4).- CLS is typically
ACTfor definitions andINFfor descriptions of code. - SUB identifies the module, class, or function being described.
- TAG uses code primitives from Section 2.
- ARG1 carries the name/signature.
- ARG2 carries the body or logic in compressed notation.
Code Body Compression Rules
- Semicolon separation: Multiple statements compress to semicolon-delimited sequences.
- Arrow notation for returns:
->replaces explicit return for single-expression functions. - Brace elision: Indentation-based languages (Python) omit braces. Brace languages
compress to
{...}with semicolon-separated interior. - Type shorthand: Common types abbreviate:
s=str,i=int,f=float,b=bool,L=list,D=dict,O=optional. - Implicit self/this: First parameter
selforthisis omitted in compressed form.
4. Language Tags
| Tag | Language |
|---|---|
| py | Python |
| js | JavaScript |
| ts | TypeScript |
| sql | SQL |
| sh | Shell/Bash |
| go | Go |
| rs | Rust |
| rb | Ruby |
| java | Java |
| api | REST/HTTP API |
5. Code Compression Examples
Example 1: Python Function
Original (147 chars):
def calculate_discount(price: float, rate: float = 0.1) -> float:
if rate > 0.5:
raise ValueError("Rate too high")
return price * (1 - rate)
Compressed (82 chars):
PKT[4|ACT|calc|fn|calculate_discount(f:price,f:rate=0.1)->f|cd rate>0.5:er "Rate too high";rt price*(1-rate)|^mode:code ^lang:py]
Savings: 65 chars (44.2%)
Example 2: JavaScript Class
Original (198 chars):
class UserCache {
constructor(maxSize = 100) {
this.cache = new Map();
this.maxSize = maxSize;
}
get(key) {
return this.cache.get(key);
}
set(key, value) {
this.cache.set(key, value);
}
}
Compressed (112 chars):
PKT[4|ACT|UserCache|cl|constructor(i:maxSize=100)|as cache=Map();as maxSize=maxSize|^mode:code ^lang:js]
PKT[4|ACT|UserCache.get|fn|key|->cache.get(key)|^mode:code ^lang:js]
PKT[4|ACT|UserCache.set|fn|key,value|cache.set(key,value)|^mode:code ^lang:js]
Savings: 86 chars (43.4%)
Note: The class packet is lossy IR: constructor binding and this context are
compressed away. Individual methods mark as lossless where the round-trip is exact.
Example 3: SQL Query
Original (186 chars):
SELECT u.name, u.email, COUNT(o.id) AS order_count
FROM users u
JOIN orders o ON u.id = o.user_id
WHERE o.created_at >= '2025-01-01'
GROUP BY u.name, u.email
HAVING COUNT(o.id) > 5
ORDER BY order_count DESC;
Compressed (108 chars):
PKT[4|QRY|users+orders|qry|u.name,u.email,COUNT(o.id):order_count|JOIN o ON u.id=o.user_id;WHERE o.created_at>=250101;GROUP u.name,u.email;HAVING COUNT(o.id)>5;ORDER order_count DESC|^mode:code ^lang:sql]
Savings: 78 chars (41.9%)
Example 4: Shell Script
Original (142 chars):
#!/bin/bash
for file in /var/log/*.log; do
if [ -f "$file" ] && [ $(wc -l < "$file") -gt 1000 ]; then
gzip "$file"
fi
done
Compressed (86 chars):
PKT[4|ACT|log.compress|lp|file:/var/log/*.log|cd -f file AND wc -l file>1000:gzip file|^mode:code ^lang:sh]
Savings: 56 chars (39.4%) Note: This is lossy IR. Shell quoting, command substitution syntax, and redirection operators are not preserved. The compressed form captures the structural intent (loop, filter, compress) but decompressed output requires shell-specific reconstruction. Round-trip execution fidelity is not guaranteed for shell.
Example 5: REST API Endpoint
Original (163 chars):
POST /api/v2/users
Content-Type: application/json
Authorization: Bearer {token}
{"name": "string", "email": "string", "role": "admin|user"}
Response: 201 Created
Compressed (92 chars):
PKT[4|ACT|api.users|crt|POST /v2/users|body:D{s:name,s:email,s:role=admin+user};auth:bearer;rsp:201|^mode:code ^lang:api]
Savings: 71 chars (43.6%)
Note: The literal pipe in admin|user is replaced with + (union notation)
because pipe is the field delimiter and is prohibited in free-text fields. This is
a lossy transformation, the original enum syntax is not preserved.
Summary
| Example | Before (chars) | After (chars) | Savings |
|---|---|---|---|
| Python fn | 147 | 82 | 44.2% |
| JS class | 198 | 112 | 43.4% |
| SQL query | 186 | 108 | 41.9% |
| Shell loop | 142 | 86 | 39.4% |
| REST API | 163 | 89 | 45.4% |
| Mean | 167.2 | 95.4 | 42.9% |
6. Fidelity Contract
Code compression is a lossy structural intermediate representation. The contract:
- Preserved: identifier names, string literals, numeric constants, control flow structure, operation sequence, data shape, type annotations where present.
- Allowed lossy: comments, whitespace, coding style, import ordering, binding semantics (this/self dispatch), constructor internals, language-specific syntax (shell quoting, command substitution, method resolution order).
- Not guaranteed: identical execution behavior. Decompressed output may require manual adjustment to compile or run in the target language.
The boundary between "survives round-trip" and "requires adjustment" is empirical, not formally defined. Simple constructs (single-expression functions, basic assignments) tend to round-trip cleanly. Complex constructs (classes with inheritance, shell pipelines, cross-language API specs) do not. The spec does not promise where this boundary lies because the grammar cannot enforce it.
This is compression for communication: an agent reading the compressed form should understand what the code does. It is not compression for archival: you cannot reconstruct the original source from the compressed form alone.
Comment Handling
Comments and docstrings are lossy by default. A ^keep:docs meta flag requests
comment preservation, but this is advisory, not guaranteed.
7. Backward Compatibility
- Code packets are valid v4 packets. A v3.x parser will parse the pipe-delimited fields correctly but will not understand the code body notation in ARG2.
- The
^mode:codeflag allows parsers to skip code packets they cannot process. - Without
^mode:code, packets are treated as natural language under v3.x rules. - Language-unaware parsers treat
^lang:XXas an opaque meta flag.
Source & Raw Text
Repos are PRIVATE. Public visitors will see HTTP 404 unless authenticated.