H2LooP Spark Preview: Continual Pretraining of Large Language Models for Low-Level Embedded Systems Code
arXiv cs.LG / 3/13/2026
📰 NewsSignals & Early TrendsModels & Research
Key Points
- H2LooP Spark Preview presents a continual pretraining pipeline that adapts the OLMo-3-7B-a LLM to embedded systems programming using BF16 LoRA on 8 NVIDIA H100 GPUs.
- The training data combines 100B tokens of repository-datasheet pairs from 117 manufacturers, with a curated 23.5B tokens spanning 13 embedded domains via a SpecMap-inspired mapping approach.
- In benchmarks, the 7B model achieves superior token accuracy across 13 embedded domains, reducing in-domain perplexity by 70.4% and held-out repository perplexity by 66.1%, outperforming Claude Opus 4.6 and Qwen3-Coder-30B in 8 categories.
- The authors release the production training checkpoint on Huggingface as an open-source artifact, enabling broader use by researchers and practitioners.
Related Articles

PearlOS. We gave swarm intelligence a local desktop environment and code control to self-evolve. Has been pretty incredible to see so far. Open source and free if you want your own.
Reddit r/LocalLLaMA

Waymo hits 170 million miles while avoiding serious mayhem
The Verge

The Inference Market Is Consolidating. Agent Payments Are Still Nobody's Problem.
Dev.to
QwenDean-4B | fine-tuned SLM for UIGen; our first attempt, looking for feedback!
Reddit r/LocalLLaMA

Signal’s Creator Is Helping Encrypt Meta AI
Wired