This is a research article about a patent I filed (not self promotion).
I am dyslexic so I used AI to help with the writing.
I have been working on Spectral Compact Training (SCT). It stores every weight matrix as [ W = U \operatorname{diag}(s) VT ] and trains directly through the small spectral factors.
Never builds the dense matrix. Exact gradients via standard backprop. QR retraction keeps U and V orthonormal after each optimizer step.
Results on a 70B-class architecture (80 layers, hidden=8192, FFN=28672, LLaMA-3 style): Dense + Adam: 1,245 GB SCT + Adam: 7.24 GB Compression: 172x Full training step on Steam Deck: 6.28 seconds Orthonormality error: ( 1.30 \times 10{-6} )
Video proof (full run on Steam Deck CPU, 16 GB RAM): git -> proof
To be clear: this is architectural validation, not a finished trained model. SCT solves the memory wall. Compute time stays the same.
MLP proof shows SCT matches dense training quality exactly (100 percent on XOR, near identical loss on sine regression). Compression scales with model size. It is weak below 1.7B but powerful at 7B+.
Code (Apache 2.0): https://github.com/EctoSpace/SCT
Patent pending. Happy to answer questions about the math or limitations. Looking for arXiv cs.LG endorsers. DM me.
[link] [comments]



