I am trying to query and extract information from a large, semi-structured org-mode file (with hierarchical entries and cross links) of about 800000 tokens length (depending on LLM, file size is about 2.5MB). This is basically a notes file spanning about 10 years of practical information of various kind, and definitively way too long to remember what's all inside. The file cross-references also elements of a maildir directory with ca 100000 mails.
I tried to directly feed that org-mode file into self-hosted LLMs by passing a "--ctx-size 0" (= native 1048576 tokens context window), and that works with:
- Qwen3-Coder-30B-A3B-Instruct-1M-GGUF BF16
- nvidia_Llama-3.1-8B-UltraLong-4M-Instruct-GGUF BF16
- Meta/Llama-4-Scout-17B-16E-Instruct-GGUF/UD-Q4_K_XL
- NVIDIA-Nemotron-3-Nano-30B-A3B/UD-Q5_K_XL and UD-Q8_K_XL
- NVIDIA-Nemotron-3-Super-120B-A12B-GGUF UD-IQ4_XS / UD-Q5_K_S / UD-Q8_K_XL / BF16
I use llama.cpp.
Prefill takes between 90s and 60m (PP between 4700 t/s and 220 t/s), depending on size of the LLM, and token generation after uploading the org-mode file is between 90 and 24 t/s.
Hardware is a Zen5 32-core Threadripper Pro with 512GB of ECC RAM and dual RTX5090.
Yet, — results are mixed, at best. If I simply ask for factual information I do know is in the file, it is frequently answered wrong or distorted, and more general questions result in BS or at least in something totally unusable. A frequent pattern of failure in the answers is confusing and conflating similar events that are noted in the file.
This is a totally different experience than simply chatting with those same models without the enormous 1m token context window, and then the models are actually very good.
Is "--temp" a relevant setting for this use case?
The idea to throw the file directly at a 1M token context model originated as a means to avoid the complexities of a full RAG pipeline.
Why do those LLMs fail with very long contexts and what would be a better tool to make this info (file and maildir) transparent and operable?
[link] [comments]




