
A German research team lets Transformer models decide for themselves how many times they think about a problem. Combined with additional memory, the approach outperforms larger models on math problems.
The article Math needs thinking time, everyday knowledge needs memory, and a new Transformer architecture aims to deliver both appeared first on The Decoder.