| So, I've had my H100s grind for you all, and have some interesting new results AND fresh models! So, what did I find? Well because my blog article are too damn long (I know some of you are not reading the whole thing...), here is a TL;DR:
If you still didnt read the blog, well, I guess you can just try the models? https://huggingface.co/dnhkng/RYS-Qwen3.5-27B-FP8-S https://huggingface.co/dnhkng/RYS-Qwen3.5-27B-FP8-M https://huggingface.co/dnhkng/RYS-Qwen3.5-27B-FP8-L https://huggingface.co/dnhkng/RYS-Qwen3.5-27B-FP8-XL Wen GGUF? When someone GGUF's them I guess? When you repeat layers, you benefit a lot from fine tuning. I expect the first team to fine tune RYS-Qwen3.5-27B-FP8-XL will have a new SOTA for that size range. Lastly, Ive been chatting with TurboDerp; hopefully we can get this into a new format where you can keep the duplicated later as copies, and not use more VRAM (except for the KV cache). Stay tuned! [link] [comments] |
RYS II - Repeated layers with Qwen3.5 27B and some hints at a 'Universal Language'
Reddit r/LocalLLaMA / 3/24/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The author reports findings from running H100-based experiments suggesting that LLMs may form “universal language” latent representations in mid-layer transformer blocks, with Chinese and English representations for the same content becoming more similar than representations across different content within a language.
- They conclude that repeating blocks in the middle portion of the transformer stack performs best compared with other approaches they tried.
- The post shares multiple new released model variants based on Qwen3.5 27B (FP8 tiers) on Hugging Face for others to test.
- The author expects that fine-tuning the largest repeated-layer variant (FP8-XL) could achieve new state-of-the-art results in its model-size range.
- They are also discussing a future packaging/format that keeps duplicated layers as copies to reduce additional VRAM usage beyond the KV cache.
Related Articles
The Moonwell Oracle Exploit: How AI-Assisted 'Vibe Coding' Turned cbETH Into a $1.12 Token and Cost $1.78M
Dev.to
How CVE-2026-25253 exposed every OpenClaw user to RCE — and how to fix it in one command
Dev.to
Day 10: An AI Agent's Revenue Report — $29, 25 Products, 160 Tweets
Dev.to
Does Synthetic Data Generation of LLMs Help Clinical Text Mining?
Dev.to
What CVE-2026-25253 Taught Me About Building Safe AI Assistants
Dev.to