New "major breakthrough?" architecture SubQ

Reddit r/LocalLLaMA / 5/6/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The post claims SubQ has achieved a major model-architecture breakthrough with a 12M-token context window, purportedly outperforming models like Opus and Gemini at far lower cost.
  • It further alleges extremely fast inference performance—reportedly processing tokens 52× faster than FlashAttention—using only around 5% of the cost.
  • The original author expresses skepticism because there is no accompanying code, paper, API, or other artifacts to verify or reproduce the results.
  • The discussion centers on whether the claims could be true versus being exaggerated, with the author leaning strongly toward calling it BS and asking for community judgment.

while reading through papers and news today i came across this post/blog , claiming major architectural breakthrough , having 12M tokens context window , better than opus , gemini and other models and whopping less than 5% of the cost and it processes token 52X faster than flashattention , yep you read that number right , Fifty two times , at this point i instantly called BS and was ready to move one tbh , there is zero code , paper , api or anything to either test it out or reproduce it .

so i was thinking maybe there is a slight chance i am a complete idiot and somehow this is the next "attention is all you need" thing , what do you guys think ? i am calling bs tbh

submitted by /u/Daemontatox
[link] [comments]