[P] Yet another garage model - Prisma: Interpretability-Inspired Architecture

Reddit r/MachineLearning / 3/12/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

Prisma is proposed as a garage model with interpretability-inspired design choices to improve data efficiency.
It uses attention and output weight sharing to reduce parameters and adds an extra FFN weight set that acts as a nested gate, increasing capacity in a controlled way.
It introduces Word-Relative Rotary Position Embedding to handle positional information.
The author reports about 25% higher data efficiency than the standard transformer on benchmarks (arc-e, arc-c, piqa, boolq, hellaswag) and trained on 30B tokens on OpenWebText and fineweb-edu using a single H100.
The post invites feedback and provides a link to the HuggingFace repository.

Hey y'all! I think some of you might be interested in this creature.

Don't roast me that much, as I really wanted to collect your feedback and ideas about this crap prototype.

At least it is not GPT/Llama/Mistral/Qwen architecture based, I based it on some ideas that I had while studying other models. The basic differences are:

Attention and output weight sharing (reduces parameters);
Additional weight set in the FFN (increases parameters, yay!);
Introduces Word-Relative Rotary Position Embedding;

The thing with the added weights, I think is the most interesting part of the architecture and I'd like many pinches of salt on that. This weight set is used as a nested gate, making the usual W2 @ (W1 @ x * silu(W3 @ x)) to be W2 @ (W1 @ x * silu(W3 @ x * silu(W4 @ x)))... I'll leave it as this and wait for the stones to come.

Yes, it is a garage model but works. It is about 25% more data efficient than the "standard transformer architecture", regarding trainging and gets pretty decent results in basic benchmarks (arc-e, arc-c, piqa, boolq, hellaswag...). Trained in a single H100 with 30B tokens (openwebtext and fineweb-edu).

Anyhow. If you're interested hf:y3i12/Prisma.

Looking forward for your thoughts and comments 😁

submitted by /u/y3i12
[link] [comments]