Understudy: local-first, desktop agent that learns tasks from gui demonstrations (MIT, open source)

Reddit r/LocalLLaMA / 3/13/2026

📰 NewsTools & Practical UsageModels & Research

共有:

Key Points

Understudy is an open-source, local-first desktop agent (MIT-affiliated) that can operate GUI apps, browsers, shell tools, files, and messaging in one local runtime.
It uses teach-by-demonstration: you perform a task once, the agent records screen video and semantic events, extracts the intent rather than coordinates, and publishes a reusable skill.
In a demonstration, it learns a multi-step workflow (Google Image search, download a photo, remove background in Pixelmator Pro, export, and send via Telegram) and can generalize to new targets.
The project is hosted on GitHub at understudy-ai/understudy with a supporting video demo on YouTube.

Understudy: local-first, desktop agent that learns tasks from gui demonstrations (MIT, open source)

I've been building Understudy, an open-source desktop agent that can operate GUI apps, browsers, shell tools, files, and messaging in one local runtime.

The core idea is teach-by-demonstration: you do a task once, the agent records screen video + semantic events, extracts the intent rather than coordinates, and publishes a reusable skill.

Video: Youtube

In this demo I teach it:

Google Image search -> download a photo -> remove background in Pixelmator Pro -> export -> send via Telegram

Then I ask it to do the same thing for another target.

GitHub: understudy

submitted by /u/bayes-song
[link] [comments]