Auto-creation of agent SKILLs from observing your screen via Gemma 4 for any agent to execute and self-improve

Reddit r/LocalLLaMA / 4/7/2026

💬 OpinionDeveloper Stack & InfrastructureSignals & Early TrendsTools & Practical Usage

Key Points

  • AgentHandover is an open-source macOS menu bar app that uses local Gemma 4 (via Ollama) to watch the user’s screen and convert repeated workflows into structured “Skill” files for agents to execute.
  • It supports both manual recording for specific tasks (Focus Record) and automatic background discovery of recurring actions (Passive Discovery), with Skills improving after each observation.
  • The system is described as a fully on-device 11-stage pipeline that keeps screen data on the machine, with encryption at rest and no data leaving the device.
  • Skills can be integrated one-click via MCP so any MCP-compatible agent tool (e.g., Claude Code, Cursor, OpenClaw) can use the learned Skills; a CLI is also available.
  • The project is intended to reduce the need to re-explain common processes to agents by “learning” them from the user’s behavior and refining steps, guardrails, and confidence scores over time.
Auto-creation of agent SKILLs from observing your screen via Gemma 4 for any agent to execute and self-improve

AgentHandover is an open-source Mac menu bar app that watches your screen through Gemma 4 (running locally via Ollama) and turns your repeated workflows into structured Skill files that any agent can follow.

I built it because every time I wanted an agent to handle something for me I had to explain the whole process from scratch, even for stuff I do daily. So AgentHandover just watches instead. You can either hit record for a specific task (Focus Record) or let it run in the background where it starts picking up patterns after seeing you repeat something a few times (Passive Discovery).
Skills get sharper with every observation, updating steps, guardrails, and confidence scores as it learns more. The whole thing is an 11-stage pipeline running fully on-device, nothing leaves your machine, encrypted at rest. One-click agent integration through MCP so Claude Code, Cursor, OpenClaw or anything that speaks MCP can just pick up your Skills. Also has a CLI if you prefer terminal.

SImple illustrative demo in the video, Apache 2.0, repo: https://github.com/sandroandric/AgentHandover

Would love feedback on the approach and curious if anyone has tried other local vision or OS models for screen understanding...thxxx

submitted by /u/Objective_River_5218
[link] [comments]