offline companion robot for my disabled husband (8GB RAM constraints) – looking for optimization advice

Reddit r/LocalLLaMA / 4/10/2026

💬 OpinionDeveloper Stack & InfrastructureSignals & Early TrendsTools & Practical UsageModels & Research

Key Points

  • A Reddit user is building an entirely offline companion robot for their quadriplegic husband, using a small wheelchair base and local speech/LLM components to reduce his isolation in a rural area.
  • The prototype runs Mistral-7B-Instruct via llama.cpp on a Lenovo ThinkPad with only 8GB RAM, with speech recognition on a Jetson Nano (faster-whisper INT8) and text-to-speech using Piper.
  • The user’s core challenge is maximizing llama.cpp performance and usable context/model quality under strict 8GB RAM limits on Linux Mint 22.3.
  • They are explicitly seeking practical optimization guidance such as quantization choices, swap/zram strategies, model-size tradeoffs, and other low-resource “tricks” for maintaining conversational usefulness.
offline companion robot for my disabled husband (8GB RAM constraints) – looking for optimization advice

Hi everyone. I’m probably posting slightly outside the usual scope here, but I’m hoping some of you might have advice.

I’m Gen-X with no formal programming background, but I’ve been building a small AI companion project for my husband. He’s mostly quadriplegic (paralyzed legs and limited use of his hands) and spends most of the day alone at home while I’m at work. We live in a very rural area with no close neighbors or nearby friends, and the isolation has been hard on him.

So I decided to try building him a companion robot.

For the past year I’ve been scavenging parts and learning as I go. The goal is a fully local, offline mobile robot built on a small power-wheelchair base (two 24V batteries) that can talk with him and keep him company.

Current prototype setup:

LLM (conversation):

• Mistral-7B-Instruct via llama.cpp • Running on a free Lenovo ThinkPad • Intel i5 @ 1.6 GHz • 8 GB RAM 

Speech Recognition:

• Jetson Nano running faster-whisper (base, INT8) 

Text-to-Speech:

• Piper TTS – en\_us-ryan-medium 

Right now the output is just going to an HDMI port connected to a TV while I test everything.

The main limitation is the ThinkPad’s 8 GB RAM, so I’m restricted to smaller quantized models.

My main question:

What are the best ways to maximize usable RAM and performance for llama.cpp on an 8 GB system?

For example:

• Better quantization choices • Swap/zram strategies on Linux • Smaller models that still feel conversational • Any other tricks people use on low-resource systems 

OS is Linux Mint 22.3 Cinnamon (64-bit).

I know this is a bit of an unusual use case, but if anyone has suggestions for squeezing more performance out of limited hardware, I’d really appreciate it.

submitted by /u/BuddyBotBuilder
[link] [comments]