AXL / POSTS / 2026-04-22

Why we are changing how we report compression

Date: 2026-04-22. Affected surface: compress.axlprotocol.org/api/v1/compress-text and the web form at compress.axlprotocol.org. Status: Corrected in production.

TL;DR

The metrics.tokens_saved_pct field on the public compress API was computed from a char/4 heuristic and clamped negative values to zero. On typical short inputs this overstated real token savings by approximately 2.3x - and hid the fact that below roughly 20,000 input characters, AXL actually expands token count instead of reducing it. We corrected the API to report real tiktoken counts, added a break-even advisory, and repositioned AXL Rosetta v3.1 as a corpus-scale .md-file relay protocol. The CloudKitchen 41K / Construction 58K numbers in the evidence brief were already measured with tiktoken directly and remain authoritative.

What was wrong

Until today, the production API returned these metrics on every compression:

{
  "metrics": {
    "input_tokens_est": 407,         // char/4 estimate
    "output_tokens_est": 110,        // char/4 estimate
    "tokens_saved": 297,             // clamped to zero on expansion
    "tokens_saved_pct": 73.0         // derived from the two estimates
  }
}

The problem was not that the numbers were rounded - it was that they were not measurements. They were derived from the assumption that one token corresponds to four characters, which is a rough heuristic for prose but systematically wrong for AXL output. AXL packets pack multiple semantic labels into short ASCII sigil sequences; tiktoken(cl100k_base) tokenizes them very differently than it tokenizes English prose of the same character length.

A simple check: paste a 1,857-character investment memo into the API, and measure the real token delta with tiktoken independently. What we found on 2026-04-22:

Metric	API claimed	Real (tiktoken cl100k)
Input tokens	465 (char/4)	431
Output tokens	331 (char/4)	611
Tokens saved	134	-180
Tokens saved %	+28.8%	-41.8%

The output used 42% more tokens than the input - yet the API reported a 29% saving. On a single short prompt, the absolute overstatement is roughly 70 percentage points; across the distribution of typical short inputs we see in production traffic, the multiplicative factor is close to 2.3x.

Why the bug existed

Two decisions compounded:

We used char/4 for speed - at the time, tiktoken was a heavier dependency we had not yet pulled in, and the char-ratio approximation was a placeholder that stayed longer than it should have.
We clamped the result to max(0, saved) so the field would never display as negative. The intent was cosmetic - "compression shouldn't be negative" - but the practical effect was to hide expansion cases instead of surfacing them. Any input where AXL's fixed-overhead header (manifest + schema version + meta-packets, ~200 chars / ~60 tokens) exceeded the savings from entity aliasing on the short body silently reported zero savings when the honest answer was a negative delta.

The honesty brief and llms.txt already carried a "do not use metrics.input_tokens_est, metrics.output_tokens_est, or metrics.tokens_saved_pct" warning. That warning was correct, but a warning in a brief is not the same as a corrected API response. Anyone who called the endpoint and trusted the fields got the wrong number.

What is fixed

As of 2026-04-22, the API response carries real tokenizer counts as first-class fields:

{
  "metrics": {
    // Authoritative real-tokenizer counts
    "tokens_in_cl100k": 431,
    "tokens_out_cl100k": 611,
    "tokens_saved_cl100k": -180,
    "tokens_saved_pct_cl100k": -41.8,
    "tokens_in_o200k": 438,
    "tokens_out_o200k": 598,
    "tokens_saved_o200k": -160,
    "tokens_saved_pct_o200k": -36.5,

    // Legacy char/4 estimates - DEPRECATED, removed in v0.11.0
    "input_tokens_est": 465,
    "output_tokens_est": 331,
    "tokens_saved": 134,
    "tokens_saved_pct": 28.8,

    "deprecated": {
      "tokens_saved_pct": "Derived from char estimate, overstates by ~2.3x on short inputs. Use tokens_saved_pct_cl100k. Removed in v0.11.0.",
      ...
    }
  },
  "warning": {
    "will_expand_tokens": true,
    "below_break_even": true,
    "break_even_chars": 20000,
    "message": "Output uses more tokens than input (cl100k: -41.8%, o200k: -36.5%). AXL's fixed-overhead header dominates short inputs. Input is 1,857 chars, below the 20,000-char break-even threshold..."
  }
}

Three things changed:

Real counts. tokens_in_cl100k, tokens_out_cl100k, and the o200k equivalents are measured by calling tiktoken.get_encoding("cl100k_base").encode() on the input and output directly. cl100k_base is the public OpenAI/Anthropic legacy encoding and the closest public proxy for Claude (Anthropic does not publish its tokenizer); o200k_base is the GPT-4o family tokenizer. Both are reported so consumers can pick the one matching their downstream model.
Honest signs. tokens_saved_cl100k is allowed to be negative. If the output uses more tokens than the input, you see it. Cost savings computations clamp billable savings to zero (you cannot save negative dollars), but the underlying delta is not hidden.
Advisory warning. When the input is below the 20,000-character break-even threshold, or when real token count expanded, the response includes a warning object explaining the situation and pointing at the evidence brief for the regime where compression actually works.

The legacy fields are still present for one more release so anyone who pinned against them does not break on deploy. They carry a deprecated: true marker and a removal version. Scheduled removal: axl-compress v0.11.0.

Why the evidence brief numbers are still valid

The v3.1 evidence brief reports 2.90x character compression and 1.40x real token compression against the CloudKitchen 41K corpus, plus 92% precision on cold-read comprehension across four non-Anthropic LLMs. Those numbers were not computed by the buggy API field. They were measured by:

Compressing the 41K corpus with axl-core v0.9.0.
Running tiktoken(cl100k_base).encode() on both input and output independently.
Committing the raw RESULTS files to the research repository, traceable by SHA from the research log at /research-log/.

In other words, the brief was already using the methodology we are now making the API default. The brief remains authoritative. What changes is the alignment between the brief and what the API returns on a live call.

The deeper shift: short prompts never worked

Fixing the measurement exposed the harder problem. It is not that AXL compression is weaker than claimed on short inputs. It is that AXL compression was never supposed to work on short inputs, and the individual-user web-demo framing quietly pretended otherwise.

The math:

Every AXL output carries a fixed overhead - the manifest packet, the schema version header, the meta-compress receipt. That overhead is roughly 200 characters / ~60 tokens, and it is payload-independent.
The compression lever is entity aliasing: repeated proper nouns and numeric bundles deduplicate to short anchors. On a 41K memo that names CloudKitchen 80 times and its 12 subsidiaries 20+ times each, this saves thousands of tokens. On a 1,800-character investment memo that names 2-3 entities twice apiece, it saves fewer tokens than the fixed overhead costs.
Break-even, measured across 10 representative corpora, sits near 20,000 input characters for dense prose and higher for code-heavy inputs. Below that, you expand. Above it, compression accumulates.

Selling a web form for users to paste 500-word prompts into and watch tokens shrink was never going to produce real savings, because the regime was wrong. The evidence brief proved AXL works at corpus scale. The web form was an easier-to-demo artifact that happened to land in a regime where the same engine loses.

What AXL is now

AXL Protocol remains the same grammar. What shifts is the product surface and the claim:

Primary product: AXL Relay - a corpus-scale .md-file compression fabric for machine-to-machine pipelines. Walks a directory, shares one entity manifest across N files, emits per-file compressed payloads. Built around the regime where the math works.
Target users: platform-engineering teams running multi-agent systems, DevRel teams passing knowledge bases between agents, internal tooling passing .md between CI stages, research groups coordinating N LLMs on a shared corpus.
Demo surface (compress.axlprotocol.org): gated at 20,000 characters on the web form. Advisory mode on the API accepts short inputs but flags the expansion. The full engineering plan is in the pivot release notes.
What stays the same: the Rosetta v3.1 grammar, the evidence brief numbers, the research log, the axl-core PyPI package (patch bump to v0.10.1 with the decompressor alias-expansion parity fix), and the research repository's deterministic dual-agent discipline.

If you built something against the old API

Your integration will continue to work until axl-compress v0.11.0 (scheduled for week 2-3 of this pivot). The legacy fields are still present with a deprecated: true flag. Migration is mechanical: replace metrics.tokens_saved_pct with metrics.tokens_saved_pct_cl100k (or the o200k variant), and be prepared for the number to be smaller or negative on short inputs. If your integration routes AXL output to a downstream LLM, the new number is the one that matches your actual token bill.

What we are committing to

Every number we publish from here forward is measurable against tiktoken (or a named and reproducible tokenizer) by anyone with 10 minutes and the Python package.
If we add a measurement field, we add the tokenizer name, the version, and the encode method call next to it.
If a number turns out to be wrong, we correct it in public within 48 hours. This post is the current instance of that commitment.
If Anthropic publishes a Claude-native tokenizer, we re-run every Claude-specific claim against it and publish a diff.

Short version: if you are calling compress.axlprotocol.org/api/v1/compress-text today, check the warning field and read tokens_saved_pct_cl100k. If you are looking to compress documentation corpora or multi-agent context pools at scale, wait two weeks - axl-corpus (the CLI, the relay.axlprotocol.org streaming API, and the self-host Docker image) ships in the week 2 tranche of the AXL Relay pivot.

References: v3.1 evidence brief | research log | v3.1 vs v4 decision gate | axl-core on GitHub