Gemma 4 26B A4B is still fully capable at 245283/262144 (94%) contex !

Reddit r/LocalLLaMA / 4/11/2026

💬 OpinionSignals & Early TrendsTools & Practical UsageModels & Research

共有:

Key Points

Redditユーザーの検証では、Gemma 4 26B A4B（GGUF）を最大262,144トークン級のコンテキストの約94%（245,283/262,144）まで投入しても、特定ユーザー発言の照合・回答が高精度で保たれたと報告されています。
長大コンテキストでの自己言及ループ（自己疑問の掘り下げ・議論の無限化）を抑えるため、温度を下げ、repeat penaltyを1.17/1.18付近に調整すると改善したとのことです。
NVIDIA SMIからリアルタイムデータを取得するスクリプトに関する問題について、Gemini 3.1では解決しなかったがGemma 4側では解消できた、という具体的な作業上の改善例も挙げられています。
実験ではllama.cppの最新版（更新頻度が高い）とUnslothの最新版GGUFを使用しており、モデルやビルド更新の再DLが必要だと注意喚起しています。

Gemma 4 26B A4B is still fully capable at 245283/262144 (94%) contex !

https://preview.redd.it/x4nv3btr0kug1.png?width=1919&format=png&auto=webp&s=3c4cdda920a1cb74407e9292acb5bbeccea3bb5f

It solved an issue with a script that pulls real-time data from NVIDIA SMI; Gemini 3.1 actually failed to fix it even in a fresh session, lol.

It’s kind of mind-blowing how in 2026 we already have stable local models with 200k+ context! I tested it out by feeding it as many Reddit posts, random documentation files, and raw files from the llama.cpp repo as possible to bump the usage up and see how it affects my VRAM. Even during this testing, Gemma kept its mind intact! At 245,283 / 262,144 (94%) context, if I ask it what a specific user said, it matches perfectly and answers within 2–5 seconds.

245283/262144 (94%) at this contex , if i ask it to tell me what this user said and perfectly matches it and tells me , within 2-5 seconds

https://preview.redd.it/fo0myzkp1kug1.png?width=831&format=png&auto=webp&s=2b46c5ef672138c20c7e0e5ca85814569112ec0e

From previous tests, I found I had to decrease the temperature and bump the repeat penalty to 1.17/1.18 so it doesn't fall into a loop of self-questioning. Above 100k context, it used to start looping through its own thoughts and arguing; instead of providing a final answer, it would just go on forever. These settings helped a lot!

I'm using the latest llama.cpp (which gets updates almost every hour) and the latest Unsloth GGUF from 2–6 hours ago, so make sure to redownload!

Model : gemma-4-26B-A4B-it-UD-IQ4_NL.gguf , unsloth (unsloth bis)
These are my current settings for llama.ccp , that i start with pshel script :

# --- [2. OPTIMIZATION PARAMETERS] --- $ContextSize = "262144" $GpuLayers = "99" $Temperature = "0.7" $TopP = "0.95" $TopK = "40" $MinP = "0.05" $RepeatPenalty = "1.17" # --- [3. THE ARGUMENT CONSTRUCTION] --- $ArgumentList = @( "-m", $ModelPath, "--mmproj", $MMProjPath, "-ngl", $GpuLayers, "-c", $ContextSize, "-fa", "1", "--cache-ram", "2048", "-ctxcp", "2", "-ctk", "q8_0", "-b", "512", # Smaller batch for less activation overhead "-ub", "512", "-ctv", "q8_0", "--temp", $Temperature, "--top-p", $TopP, "--top-k", $TopK, "--min-p", $MinP, "--repeat-penalty", $RepeatPenalty, "--host", "0.0.0.0", "--port", "8080", "--jinja", "--metrics" )

What else i can test ? honestly i ran out of ideas to crash it! It just gulps and gulps whatever i throw at it

submitted by /u/cviperr33
[link] [comments]

Black Hat USA

AI Business

Black Hat Asia

AI Business

Why Cursor Keeps Generating Wildcard CORS -- And How to Fix It

Dev.to

Model Context Protocol (MCP): The USB-C Standard for AI Agents — Opportunities for Decentralized AI

Dev.to

What if browsers were designed for AI, not humans? (My first open source project — feedback welcome)

Dev.to

Gemma 4 26B A4B is still fully capable at 245283/262144 (94%) contex !

Key Points

Related Articles

Black Hat USA

Black Hat Asia

Why Cursor Keeps Generating Wildcard CORS -- And How to Fix It

Model Context Protocol (MCP): The USB-C Standard for AI Agents — Opportunities for Decentralized AI

What if browsers were designed for AI, not humans? (My first open source project — feedback welcome)

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer