ai slop? who knows~

Investigating a Transformer’s Forward Activations Through a Lossy Dual E8 (E16) Lattice Bottleneck I explored whether routing a transformer’s forward activations through…

By AI Maestro May 17, 2026 2 min read
ai slop? who knows~

Investigating a Transformer’s Forward Activations Through a Lossy Dual E8 (E16) Lattice Bottleneck

I explored whether routing a transformer’s forward activations through a lossy Dual E8 lattice and injecting them back into the residual stream was feasible. This investigation aimed to determine where the boundary of generative stability lies.

The Mechanism

Standard language models (LLMs) have their states represented as high-dimensional floats. Instead of applying typical scalar quantization methods like INT4, I mapped these high-dimensional activations onto a conceptual torus using a sinusoidal mapping and projected them onto Dual E8 lattice hemispheres.

The β = 0.20 Sweep (Qwen2.5-0.5B)

Sweeping the blend ratio β from 0.10 to 0.50 across layers 8–13 of `Qwen2.5-0.5B` revealed a sharp phase transition:

  • β ≥ 0.25**: Generation succumbs to heavy repetition pressure and semantic drift. The geometry acts as an attractor, trapping the decoding process (“loop-lock”).
  • β = 0.20**: This is the highest injection ratio of lossy geometric signal that maintains both numerical activation fidelity (Avg Cosine > 0.99) and open-ended generation quality (low repeated n-grams).
  • β ≤ 0.10**: The perturbation is largely absorbed by the transformer’s layer normalizations, making the intervention invisible.

Here are the data from a 300-iteration sweep:

βMin CosineAvg CosineMax MSERep-3g (Repetition Rate)
0.100.99720.99790.00240.134
0.200.99070.99160.01060.093
0.250.98390.98650.01710.084
0.300.96480.97710.02550.190
0.500.91710.92880.08500.412

Semantic scoring (evaluating prompt relevance and similarity to the unmodified baseline):

βAvg CosineRep-3gRelevancePatched-to-Baseline Sim
0.100.99800.2230.7810.889
0.200.99180.0750.7520.854
0.250.98710.2320.7170.801
0.300.97600.3920.7250.764

Generalization Across Larger Models

The β = 0.20 boundary generalizes across larger model sizes (`Qwen2.5-1.5B` and `Qwen2.5-3B`) in the activation-cosine axis:

ModelβMin CosineAvg CosineMax MSERep-3g
1.5B0.100.99880.99890.00270.267
β = 0.200.98620.99390.01050.128
β = 0.250.99040.99190.01660.398
β = 0.300.97330.98150.02350.307
β = 0.400.93680.95510.04870.191
3B (4-bit)0.100.99640.99760.01220.033
β = 0.200.98610.99040.04550.115
β = 0.250.96040.97990.06540.043
β = 0.300.97020.97780.09870.050
β = 0.400.91580.93900.17280.025

Note: In the 3B model, repetition pressure remained low across all sweeps, but the validation cosine degraded identically at β ≥ 0.25.

Storage Compression Prototypes

Utilizing the Dual E8/E16 lattice as a computational substrate also yields high theoretical storage efficiency in early prototypes:

  • KV Cache (8×): FP16 KV cache compressed to INT8 coordinates, reducing footprint from 0.21 MB to 0.02 MB.
  • Weights (112×): Projected a dense $[4864, 896]$ MLP weight matrix down to a 0.07 MB E16 footprint. (Cosine similarity of the uncalibrated weight matrix multiplication was limited to $\sim$0.078, indicating that Quantization-Aware Training is mandatory for parameter viability). A pre-projected decompression bypass was designed to run matrix multiplications directly against lattice coordinates without upcasting, avoiding memory bandwidth bottlene

    Originally published at reddit.com. Curated by AI Maestro.

    Stay ahead of AI. Get the most important stories delivered to your inbox — no spam, no noise.

    Name
Scroll to Top