| Hey all. Thought I'd share my journey. I've been fascinated with AI and LLMs, and started building apps for consumer devices (phones) and realized the market for fast, usable models for consumer hardware has felt more like an afterthought than a primary purpose. So I spent a lot of time (with the help of my own AIs) learning, researching, and designing an architecture for an SLM. After several weeks and trying different iterations of designs, I came up with an architecture that can run at 80+ tok/sec on CPU only. The model is called JTech-Nano, a 1.1B parameter SLM. No GPU needed for inference. The goal is a genuinely useful AI that runs on your phone/laptop/whatever with zero internet, zero API keys, zero cloud bills and performs efficiently. I'm now in the process of training it on my own hardware at home, targeting 100B tokens before switching to fine tuning. No cluster. No funding. No team of 50 ML engineers. Just a lot of sleepless nights watching loss curves and making sure the training regimen is running. Here's what 50B tokens of training looks like. The spike in purple is when I adjusted the learning rate schedule at 3am. The model recovered and is back on track to learning... and the training continues on. I've used r/LocalLlama a ton when I first entered the 'run at home' AI segment. I plan on releasing this model as soon as its smart enough to be useful. Hopefully not in the too distant future. [link] [comments] |
Training a 1.1B SLM at home
Reddit r/LocalLLaMA / 4/7/2026
💬 OpinionSignals & Early TrendsTools & Practical UsageModels & Research
Key Points
- The author describes building and training an at-home 1.1B “SLM” model called JTech-Nano, aimed at running efficiently on CPU-only hardware with no GPU for inference.
- Their goal is a genuinely usable local AI that works offline (no internet, no API keys, and no cloud costs) on consumer devices like phones and laptops.
- They report achieving 80+ tokens per second on CPU using a custom model architecture after multiple iterations over several weeks.
- Training is currently underway on personal hardware, targeting 100B tokens before moving to fine-tuning, without a large team or compute cluster.
- They plan to release the model once it’s smart enough to be useful, sharing training progress such as loss curve changes from learning-rate adjustments.




