offline companion robot for my disabled husband (8GB RAM constraints) – looking for optimization advice

Reddit r/LocalLLaMA / 4/10/2026

💬 OpinionDeveloper Stack & InfrastructureSignals & Early TrendsTools & Practical UsageModels & Research

共有:

Key Points

A Reddit user is building an entirely offline companion robot for their quadriplegic husband, using a small wheelchair base and local speech/LLM components to reduce his isolation in a rural area.
The prototype runs Mistral-7B-Instruct via llama.cpp on a Lenovo ThinkPad with only 8GB RAM, with speech recognition on a Jetson Nano (faster-whisper INT8) and text-to-speech using Piper.
The user’s core challenge is maximizing llama.cpp performance and usable context/model quality under strict 8GB RAM limits on Linux Mint 22.3.
They are explicitly seeking practical optimization guidance such as quantization choices, swap/zram strategies, model-size tradeoffs, and other low-resource “tricks” for maintaining conversational usefulness.

offline companion robot for my disabled husband (8GB RAM constraints) – looking for optimization advice

Hi everyone. I’m probably posting slightly outside the usual scope here, but I’m hoping some of you might have advice.

I’m Gen-X with no formal programming background, but I’ve been building a small AI companion project for my husband. He’s mostly quadriplegic (paralyzed legs and limited use of his hands) and spends most of the day alone at home while I’m at work. We live in a very rural area with no close neighbors or nearby friends, and the isolation has been hard on him.

So I decided to try building him a companion robot.

For the past year I’ve been scavenging parts and learning as I go. The goal is a fully local, offline mobile robot built on a small power-wheelchair base (two 24V batteries) that can talk with him and keep him company.

Current prototype setup:

LLM (conversation):

• Mistral-7B-Instruct via llama.cpp • Running on a free Lenovo ThinkPad • Intel i5 @ 1.6 GHz • 8 GB RAM

Speech Recognition:

• Jetson Nano running faster-whisper (base, INT8)

Text-to-Speech:

• Piper TTS – en\_us-ryan-medium

Right now the output is just going to an HDMI port connected to a TV while I test everything.

The main limitation is the ThinkPad’s 8 GB RAM, so I’m restricted to smaller quantized models.

My main question:

What are the best ways to maximize usable RAM and performance for llama.cpp on an 8 GB system?

For example:

• Better quantization choices • Swap/zram strategies on Linux • Smaller models that still feel conversational • Any other tricks people use on low-resource systems

OS is Linux Mint 22.3 Cinnamon (64-bit).

I know this is a bit of an unusual use case, but if anyone has suggestions for squeezing more performance out of limited hardware, I’d really appreciate it.

submitted by /u/BuddyBotBuilder
[link] [comments]

Black Hat USA

AI Business

Black Hat Asia

AI Business

v0.20.5

Ollama Releases

Inside Anthropic's Project Glasswing: The AI Model That Found Zero-Days in Every Major OS

Dev.to

Gemma 4 26B fabricated an entire code audit. I have the forensic evidence from the database.

Reddit r/LocalLLaMA

offline companion robot for my disabled husband (8GB RAM constraints) – looking for optimization advice

Key Points

Related Articles

Black Hat USA

Black Hat Asia

v0.20.5

Inside Anthropic's Project Glasswing: The AI Model That Found Zero-Days in Every Major OS

Gemma 4 26B fabricated an entire code audit. I have the forensic evidence from the database.

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer