| Hi everyone. I’m probably posting slightly outside the usual scope here, but I’m hoping some of you might have advice. I’m Gen-X with no formal programming background, but I’ve been building a small AI companion project for my husband. He’s mostly quadriplegic (paralyzed legs and limited use of his hands) and spends most of the day alone at home while I’m at work. We live in a very rural area with no close neighbors or nearby friends, and the isolation has been hard on him. So I decided to try building him a companion robot. For the past year I’ve been scavenging parts and learning as I go. The goal is a fully local, offline mobile robot built on a small power-wheelchair base (two 24V batteries) that can talk with him and keep him company. Current prototype setup: LLM (conversation): Speech Recognition: Text-to-Speech: Right now the output is just going to an HDMI port connected to a TV while I test everything. The main limitation is the ThinkPad’s 8 GB RAM, so I’m restricted to smaller quantized models. My main question: What are the best ways to maximize usable RAM and performance for llama.cpp on an 8 GB system? For example: OS is Linux Mint 22.3 Cinnamon (64-bit). I know this is a bit of an unusual use case, but if anyone has suggestions for squeezing more performance out of limited hardware, I’d really appreciate it. [link] [comments] |
offline companion robot for my disabled husband (8GB RAM constraints) – looking for optimization advice
Reddit r/LocalLLaMA / 4/10/2026
💬 OpinionDeveloper Stack & InfrastructureSignals & Early TrendsTools & Practical UsageModels & Research
Key Points
- A Reddit user is building an entirely offline companion robot for their quadriplegic husband, using a small wheelchair base and local speech/LLM components to reduce his isolation in a rural area.
- The prototype runs Mistral-7B-Instruct via llama.cpp on a Lenovo ThinkPad with only 8GB RAM, with speech recognition on a Jetson Nano (faster-whisper INT8) and text-to-speech using Piper.
- The user’s core challenge is maximizing llama.cpp performance and usable context/model quality under strict 8GB RAM limits on Linux Mint 22.3.
- They are explicitly seeking practical optimization guidance such as quantization choices, swap/zram strategies, model-size tradeoffs, and other low-resource “tricks” for maintaining conversational usefulness.


