[P] Tridiagonal eigenvalue models in PyTorch: cheaper training/inference than dense spectral models

Reddit r/MachineLearning / 3/18/2026

💬 OpinionTools & Practical UsageModels & Research

共有:

Key Points

The post investigates a cheaper spectral-model variant by constraining learned matrices to be symmetric tridiagonal rather than dense, aiming for a middle ground between interpretability and expressivity.
The model uses f(x) = λ_k(A0 + ∑_i x_i A_i) but replaces the dense eigensolver with a cheaper tridiagonal eigensolver, and wires scipy.linalg.eigh_tridiagonal into PyTorch autograd.
In experiments, the tridiagonal approach yielded about 5x–6x speedups on 100×100 batches, enabling larger-scale experiments at a lower cost.
The writeup frames this as an engineering note on structured spectral models and the trade-offs between linear interpretability and larger neural networks, with a link to the full writeup.

This post is part of a series I'm working on with a broader goal: understand what one nonlinear "neuron" can do when the nonlinearity is a matrix eigenvalue, and whether that gives a useful middle ground between linear models that are easy to explain and larger neural networks that are more expressive but much less transparent. Something unusual, in this "attention is all you need" world :)

In this installment, I look at a cheaper variant of the model family by constraining each learned matrix to be symmetric tridiagonal instead of dense.

The model family is still f(x) = λₖ(A₀ + ∑ᵢ xᵢAᵢ), but the eigensolve becomes much cheaper. The motivation here is that diagonal structure collapses the model to something close to piecewise linear, while tridiagonal structure still keeps adjacent latent-variable interactions.

The post walks through why this structural restriction is interesting, how I wired scipy.linalg.eigh_tridiagonal into PyTorch autograd, and what happens on a few toy and tabular experiments. In my runs, the tridiagonal eigensolver was about 5x-6x faster than the dense one on 100x100 batches, which was enough to make larger experiments much cheaper to run.

If you're interested in structured spectral models, custom autograd around numerical linear algebra routines, or model families that try to sit between linear interpretability and fully opaque neural nets, the full writeup is here:

https://alexshtf.github.io/2026/03/15/Spectrum-Banded.html

This is an engineering writeup rather than a paper, so I'd read it in that spirit.

submitted by /u/alexsht1
[link] [comments]