My calculator is a transformer

Reddit r/LocalLLaMA / 4/30/2026

💬 OpinionDeveloper Stack & InfrastructureModels & Research

Key Points

  • The post describes an experiment to “compile” an RPN calculator interpreter into transformer weights rather than training a model end-to-end.
  • It frames the transformer’s residual stream as a set of registers and uses a compiler to generate attention weights that execute the RPN logic.
  • The author distills the non-linear aspects of the calculator logic into the MLP components via training, while the attention portion is computed directly by the compiler.
  • The author notes the approach is largely impractical due to the calculator’s large size, but suggests it could offer insights into how transformers and attention work.
  • They speculate that compiling the MLP weights directly may be possible with additional structure (e.g., an AST), but it would require further development.
My calculator is a transformer

I got interested in seeing whether I could "compile" a program into transformer weights, instead of training. I've been working on it for a couple of months now but finally decided to just stop and write it up, so this is a bit of a long post but maybe some of you will find it interesting.

Basically I define the residual stream as a set of "registers" and generate the attention weights and MLP functions that execute an RPN interpreter (e.g. 2 3 + 2 * should produce 10.)

For now I settled on distilling the non-linear logic into the MLPs by training, but the attention weights are fully calculated by the compiler. I think it could be possible to calculate the MLP weights eventually too but it probably needs more of an AST behind it.

In a way it's a sort of useless exercise (who really needs an RPN interpreter that clocks in at 1.1 GB) but see the last bit for some thoughts about how this might have some application. I did learn to think of transformers and attention a bit differently after working on this, so I hope it's interesting to some people out there.

submitted by /u/radarsat1
[link] [comments]