RYS II - Repeated layers with Qwen3.5 27B and some hints at a 'Universal Language'

Reddit r/LocalLLaMA / 3/24/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The author reports findings from running H100-based experiments suggesting that LLMs may form “universal language” latent representations in mid-layer transformer blocks, with Chinese and English representations for the same content becoming more similar than representations across different content within a language.
  • They conclude that repeating blocks in the middle portion of the transformer stack performs best compared with other approaches they tried.
  • The post shares multiple new released model variants based on Qwen3.5 27B (FP8 tiers) on Hugging Face for others to test.
  • The author expects that fine-tuning the largest repeated-layer variant (FP8-XL) could achieve new state-of-the-art results in its model-size range.
  • They are also discussing a future packaging/format that keeps duplicated layers as copies to reduce additional VRAM usage beyond the KV cache.
RYS II - Repeated layers with Qwen3.5 27B and some hints at a 'Universal Language'

So, I've had my H100s grind for you all, and have some interesting new results AND fresh models!

So, what did I find? Well because my blog article are too damn long (I know some of you are not reading the whole thing...), here is a TL;DR:

  1. I found that LLMs seem to think in a universal language. During the middle layers, the models latent representations are more similar on the same content in Chinese and English than different content in the same language.
  2. I tried a bunch of different stuff, but in the end, repeating blocks in the middle of the transformer stack works the best.
  3. You should still read the blog: https://dnhkng.github.io/posts/rys-ii/

If you still didnt read the blog, well, I guess you can just try the models?

https://huggingface.co/dnhkng/RYS-Qwen3.5-27B-FP8-S

https://huggingface.co/dnhkng/RYS-Qwen3.5-27B-FP8-M

https://huggingface.co/dnhkng/RYS-Qwen3.5-27B-FP8-L

https://huggingface.co/dnhkng/RYS-Qwen3.5-27B-FP8-XL

Wen GGUF? When someone GGUF's them I guess?

When you repeat layers, you benefit a lot from fine tuning. I expect the first team to fine tune RYS-Qwen3.5-27B-FP8-XL will have a new SOTA for that size range. Lastly, Ive been chatting with TurboDerp; hopefully we can get this into a new format where you can keep the duplicated later as copies, and not use more VRAM (except for the KV cache). Stay tuned!

submitted by /u/Reddactor
[link] [comments]