Sentient OS: a custom on-device vision LLM that understands your entire digital life (every screenshot, note, file, email...), while your device charges overnight. Talk to your data, get proactive reminders, and explore knowledge graphs!

Reddit r/artificial / 5/2/2026

💬 OpinionSignals & Early TrendsTools & Practical UsageModels & Research

Key Points

  • The post describes “Sentient OS,” an on-device vision LLM aimed at understanding a user’s entire digital life (screenshots, notes, files, emails) without sending data to the cloud.
  • The author claims to have spent nearly a year optimizing the full on-device AI stack for fast, privacy-preserving multimodal inference by modifying Apple’s MLX, adapting Qwen models, and implementing custom quantization and KV-cache reuse/flash attention.
  • Core capabilities highlighted include on-device RAG-style “talk to your data,” proactive reminders extracted from the user’s own content, and knowledge graphs that help users find buried items.
  • It also mentions MCP integration so existing LLMs (e.g., ChatGPT/Claude) can connect to the user’s data via Sentient OS.
  • An early alpha is said to run all processing on a 6-year-old iPhone across about 3,000 screenshots, with upcoming support for Mac/iPhone and later Windows/Android, plus lifetime free access for the first 150 users.
Sentient OS: a custom on-device vision LLM that understands your entire digital life (every screenshot, note, file, email...), while your device charges overnight. Talk to your data, get proactive reminders, and explore knowledge graphs!

99% of "AI" apps are just GPT wrappers that pipe your data to cloud LLMs and call it a product.

No one's ever created an intelligence layer that understands your entire digital life (all your screenshots, notes, files...) before, because that’d mean sending all your data to the cloud:

  • a privacy nightmare
  • stupidly expensive to analyze 1000s of files

But on-device models are generally too dumb and run too slowly.

I spent close to a year optimizing every single layer of the on-device AI stack from scratch!

I modified Apple's MLX framework for batch multimodal inference (it wasn't built for this), transplanted vision capabilities from a 4× larger model [Qwen 3.5 9B] into a smaller one [Qwen 3.5 2B], built custom k-quants specifically for MLX, wrote device-aware quantization tuned per chip's available RAM, and implemented proprietary KV cache reuse + flash attention for inference speed.

Sentient OS analyzes and understands your entire digital life overnight while your device charges.

This unlocks:

1️⃣ Talk to your entire digital life: "what was that wine I liked?" "who did I wanna meet next week?"
[on-device RAG]

2️⃣ Proactive reminders surfaced from your own data: "Tickets for that concert you screenshoted open tomorrow!"
"That tax return in your downloads folder is due next week :("

3️⃣ Knowledge graphs of your entire digital life: tap any node to find what you buried!

And with MCP, your existing LLM (ChatGPT, Claude, etc.) can talk to your digital life too; so it actually understands you!

Early alpha processes ~3,000 screenshots entirely on-device on a 6 year old iPhone. Coming to Mac & iPhone soon (and Windows & Android in the near future!)

The first 150 users get lifetime free access 🔑
Your device does all the compute, so this costs me nothing to offer :D

https://sentient-os.ai

Would really love to hear from y’all: what more would you want an on-device multimodal LLM that understands your entire life to do?

submitted by /u/TechExpert2910
[link] [comments]