AI Navigate

[P] Yet another garage model - Prisma: Interpretability-Inspired Architecture

Reddit r/MachineLearning / 3/12/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • Prisma is proposed as a garage model with interpretability-inspired design choices to improve data efficiency.
  • It uses attention and output weight sharing to reduce parameters and adds an extra FFN weight set that acts as a nested gate, increasing capacity in a controlled way.
  • It introduces Word-Relative Rotary Position Embedding to handle positional information.
  • The author reports about 25% higher data efficiency than the standard transformer on benchmarks (arc-e, arc-c, piqa, boolq, hellaswag) and trained on 30B tokens on OpenWebText and fineweb-edu using a single H100.
  • The post invites feedback and provides a link to the HuggingFace repository.

Hey y'all! I think some of you might be interested in this creature.

Don't roast me that much, as I really wanted to collect your feedback and ideas about this crap prototype.

At least it is not GPT/Llama/Mistral/Qwen architecture based, I based it on some ideas that I had while studying other models. The basic differences are:

  • Attention and output weight sharing (reduces parameters);
  • Additional weight set in the FFN (increases parameters, yay!);
  • Introduces Word-Relative Rotary Position Embedding;

The thing with the added weights, I think is the most interesting part of the architecture and I'd like many pinches of salt on that. This weight set is used as a nested gate, making the usual W2 @ (W1 @ x * silu(W3 @ x)) to be W2 @ (W1 @ x * silu(W3 @ x * silu(W4 @ x)))... I'll leave it as this and wait for the stones to come.

Yes, it is a garage model but works. It is about 25% more data efficient than the "standard transformer architecture", regarding trainging and gets pretty decent results in basic benchmarks (arc-e, arc-c, piqa, boolq, hellaswag...). Trained in a single H100 with 30B tokens (openwebtext and fineweb-edu).

Anyhow. If you're interested hf:y3i12/Prisma.

Looking forward for your thoughts and comments 😁

submitted by /u/y3i12
[link] [comments]