# AXL Rosetta v4.0.1 - Public Release Code Compression Layer
# Frozen at: v4.0.2-r6-freeze (commit 51e75de)
# Released: 2026-04-25
# Source: github.com/dcarranza-axl/axl-research/blob/v4.0.2-r6-freeze/spec/v4-code-layer.md
# Mirror: github.com/axlprotocol/axl-research/blob/v4.0.2-r6-freeze/spec/v4-code-layer.md
# License: Apache 2.0

# AXL Rosetta v4: Code Compression Layer

Version: 4.0.0-draft
Status: Draft
License: Apache 2.0
Extends: v3.2 Ideographic Composition (spec/v3.2-ideographic.md)

## 1. Overview

The v4 code layer extends AXL Rosetta to compress programming constructs, source code
fragments, and executable commands into the same PKT structure used for natural language.
Code packets carry a `^mode:code` meta flag and a `^lang:` tag to identify the source
language. Code compression is a lossy structural intermediate representation.
Decompressed output preserves intent, structure, and identifiers but is not
guaranteed to execute identically to the original. Practical round-trip fidelity
varies by construct complexity and language. This is compression for communication,
not archival.

## 2. Code Primitives

| Primitive | Meaning                    | Maps To                        |
|-----------|----------------------------|--------------------------------|
| fn        | Function definition        | def, function, func            |
| cl        | Class definition           | class, struct                  |
| lp        | Loop construct             | for, while, loop               |
| cd        | Conditional                | if, elif, else, switch, match  |
| rt        | Return statement           | return, yield                  |
| im        | Import/include             | import, require, include, use  |
| as        | Assignment                 | =, :=, let, const, var         |
| er        | Error handling             | try, catch, except, finally    |
| tp        | Type annotation            | int, str, list, dict, custom   |

## 3. Packet Format for Code

Code packets use the standard PKT structure with mandatory meta flags:

    PKT[4|CLS|SUB|TAG|ARG1|ARG2|^mode:code ^lang:XX]

- `^mode:code` is required on every code packet.
- `^lang:XX` identifies the language using short tags (see Section 4).
- CLS is typically `ACT` for definitions and `INF` for descriptions of code.
- SUB identifies the module, class, or function being described.
- TAG uses code primitives from Section 2.
- ARG1 carries the name/signature.
- ARG2 carries the body or logic in compressed notation.

### Code Body Compression Rules

1. **Semicolon separation**: Multiple statements compress to semicolon-delimited sequences.
2. **Arrow notation for returns**: `->` replaces explicit return for single-expression functions.
3. **Brace elision**: Indentation-based languages (Python) omit braces. Brace languages
   compress to `{...}` with semicolon-separated interior.
4. **Type shorthand**: Common types abbreviate: `s`=str, `i`=int, `f`=float, `b`=bool,
   `L`=list, `D`=dict, `O`=optional.
5. **Implicit self/this**: First parameter `self` or `this` is omitted in compressed form.

## 4. Language Tags

| Tag   | Language     |
|-------|-------------|
| py    | Python       |
| js    | JavaScript   |
| ts    | TypeScript   |
| sql   | SQL          |
| sh    | Shell/Bash   |
| go    | Go           |
| rs    | Rust         |
| rb    | Ruby         |
| java  | Java         |
| api   | REST/HTTP API|

## 5. Code Compression Examples

### Example 1: Python Function

**Original** (147 chars):
```python
def calculate_discount(price: float, rate: float = 0.1) -> float:
    if rate > 0.5:
        raise ValueError("Rate too high")
    return price * (1 - rate)
```

**Compressed** (82 chars):
```
PKT[4|ACT|calc|fn|calculate_discount(f:price,f:rate=0.1)->f|cd rate>0.5:er "Rate too high";rt price*(1-rate)|^mode:code ^lang:py]
```

**Savings**: 65 chars (44.2%)

### Example 2: JavaScript Class

**Original** (198 chars):
```javascript
class UserCache {
  constructor(maxSize = 100) {
    this.cache = new Map();
    this.maxSize = maxSize;
  }
  get(key) {
    return this.cache.get(key);
  }
  set(key, value) {
    this.cache.set(key, value);
  }
}
```

**Compressed** (112 chars):
```
PKT[4|ACT|UserCache|cl|constructor(i:maxSize=100)|as cache=Map();as maxSize=maxSize|^mode:code ^lang:js]
PKT[4|ACT|UserCache.get|fn|key|->cache.get(key)|^mode:code ^lang:js]
PKT[4|ACT|UserCache.set|fn|key,value|cache.set(key,value)|^mode:code ^lang:js]
```

**Savings**: 86 chars (43.4%)
**Note**: The class packet is lossy IR: constructor binding and `this` context are
compressed away. Individual methods mark as lossless where the round-trip is exact.

### Example 3: SQL Query

**Original** (186 chars):
```sql
SELECT u.name, u.email, COUNT(o.id) AS order_count
FROM users u
JOIN orders o ON u.id = o.user_id
WHERE o.created_at >= '2025-01-01'
GROUP BY u.name, u.email
HAVING COUNT(o.id) > 5
ORDER BY order_count DESC;
```

**Compressed** (108 chars):
```
PKT[4|QRY|users+orders|qry|u.name,u.email,COUNT(o.id):order_count|JOIN o ON u.id=o.user_id;WHERE o.created_at>=250101;GROUP u.name,u.email;HAVING COUNT(o.id)>5;ORDER order_count DESC|^mode:code ^lang:sql]
```

**Savings**: 78 chars (41.9%)

### Example 4: Shell Script

**Original** (142 chars):
```bash
#!/bin/bash
for file in /var/log/*.log; do
    if [ -f "$file" ] && [ $(wc -l < "$file") -gt 1000 ]; then
        gzip "$file"
    fi
done
```

**Compressed** (86 chars):
```
PKT[4|ACT|log.compress|lp|file:/var/log/*.log|cd -f file AND wc -l file>1000:gzip file|^mode:code ^lang:sh]
```

**Savings**: 56 chars (39.4%)
**Note**: This is lossy IR. Shell quoting, command substitution syntax, and
redirection operators are not preserved. The compressed form captures the structural
intent (loop, filter, compress) but decompressed output requires shell-specific
reconstruction. Round-trip execution fidelity is not guaranteed for shell.

### Example 5: REST API Endpoint

**Original** (163 chars):
```
POST /api/v2/users
Content-Type: application/json
Authorization: Bearer {token}

{"name": "string", "email": "string", "role": "admin|user"}

Response: 201 Created
```

**Compressed** (92 chars):
```
PKT[4|ACT|api.users|crt|POST /v2/users|body:D{s:name,s:email,s:role=admin+user};auth:bearer;rsp:201|^mode:code ^lang:api]
```

**Savings**: 71 chars (43.6%)
**Note**: The literal pipe in `admin|user` is replaced with `+` (union notation)
because pipe is the field delimiter and is prohibited in free-text fields. This is
a lossy transformation, the original enum syntax is not preserved.

### Summary

| Example    | Before (chars) | After (chars) | Savings |
|------------|---------------|---------------|---------|
| Python fn  | 147           | 82            | 44.2%   |
| JS class   | 198           | 112           | 43.4%   |
| SQL query  | 186           | 108           | 41.9%   |
| Shell loop | 142           | 86            | 39.4%   |
| REST API   | 163           | 89            | 45.4%   |
| **Mean**   | **167.2**     | **95.4**      | **42.9%** |

## 6. Fidelity Contract

Code compression is a lossy structural intermediate representation. The contract:

1. **Preserved**: identifier names, string literals, numeric constants, control flow
   structure, operation sequence, data shape, type annotations where present.
2. **Allowed lossy**: comments, whitespace, coding style, import ordering, binding
   semantics (this/self dispatch), constructor internals, language-specific syntax
   (shell quoting, command substitution, method resolution order).
3. **Not guaranteed**: identical execution behavior. Decompressed output may require
   manual adjustment to compile or run in the target language.

The boundary between "survives round-trip" and "requires adjustment" is empirical,
not formally defined. Simple constructs (single-expression functions, basic
assignments) tend to round-trip cleanly. Complex constructs (classes with
inheritance, shell pipelines, cross-language API specs) do not. The spec does not
promise where this boundary lies because the grammar cannot enforce it.

This is compression for communication: an agent reading the compressed form
should understand what the code does. It is not compression for archival: you
cannot reconstruct the original source from the compressed form alone.

### Comment Handling

Comments and docstrings are lossy by default. A `^keep:docs` meta flag requests
comment preservation, but this is advisory, not guaranteed.

## 7. Backward Compatibility

- Code packets are valid v4 packets. A v3.x parser will parse the pipe-delimited
  fields correctly but will not understand the code body notation in ARG2.
- The `^mode:code` flag allows parsers to skip code packets they cannot process.
- Without `^mode:code`, packets are treated as natural language under v3.x rules.
- Language-unaware parsers treat `^lang:XX` as an opaque meta flag.
