Agentic Knowledge Base — Karpathy's LLM wiki, with adapters

Dev.to / 5/2/2026

💬 OpinionDeveloper Stack & InfrastructureIdeas & Deep AnalysisTools & Practical Usage

Key Points

  • The article argues that what was missing from the author’s agentic setup wasn’t semantic retrieval, but the right knowledge structure inspired by Karpathy’s LLM wiki framing.
  • Using Karpathy’s idea—notes written for an LLM reader plus retrieval rather than rigid taxonomy—the author identifies how to separate durable knowledge from ephemeral tasks and improve when search results are useful versus empty.
  • Instead of migrating existing notes to plain Markdown (which would break the author’s TickTick-based capture workflow), the author builds an “Agentic Knowledge Base” that preserves TickTick integration while keeping the storage layer swappable.
  • The proposed framework’s core provides parallel retrieval with RRF, a short-lived corpus cache (about 5 minutes), a bench harness, and usage logging, all exposed through adapter interfaces to plug in TickTick/Notion/Obsidian/markdown folders or other stores.

When Karpathy's LLM Wiki post landed, I already had semantic search over my TickTick — qdrant for the vector store, nomic-embed-text via ollama for embeddings, a daily cron to keep the index fresh, the works. The agent-side retrieval wasn't the missing piece.

What was missing was the structure. Karpathy's framing — designate a wiki, write notes for an LLM reader, lean on retrieval instead of taxonomy — surfaced the parts of my setup that didn't have shape yet: where durable knowledge lives versus ephemeral tasks, how agents pull structured data out of notes humans wrote, why my existing semantic search sometimes returned the right answer and sometimes returned nothing useful.

I almost migrated to plain markdown anyway. Thousands of durable notes — production playbooks, API quirks, decisions I want to survive next month's task list — already live in TickTick. They sync to my phone. Capture friction is zero. Migrating breaks all of that.

So I built the wiki structure on top of TickTick, and made the storage layer swappable. The retrieval, the wiki conventions, the agent-data note pattern, the bench harness — none of those are TickTick-specific. They're a small framework. You point it at TickTick / Notion / Obsidian / Things / a folder of markdown / whatever you've already invested years of capture habit into.

I'm calling it Agentic Knowledge Base.

The framework, in one diagram

                  ┌───────────────────────────────────┐
                  │  Agent (Claude / scripts / cron)  │
                  └─────────────────┬─────────────────┘
                                    │ akb find / get / url / links
                                    ▼
                  ┌───────────────────────────────────┐
                  │  AKB Core                         │
                  │  • parallel retrieval + RRF       │
                  │  • corpus cache (5 min TTL)       │
                  │  • bench harness                  │
                  │  • usage logging                  │
                  └─────────────────┬─────────────────┘
                                    │ adapter interface
                  ┌─────────────────┼──────────────────┐
                  ▼                 ▼                  ▼
       ┌────────────────┐  ┌────────────────┐  ┌────────────────┐
       │ adapter-       │  │ adapter-       │  │ adapter-       │
       │ ticktick       │  │ obsidian       │  │ notion         │
       │  (reference)   │  │  (filesystem)  │  │  (your turn)   │
       └────────────────┘  └────────────────┘  └────────────────┘

The Core is storage-agnostic. The retrieval, the cache, the bench, the usage logger — none of them know what TickTick is. They call a small adapter interface (~6 methods).

Karpathy's setup, in this framing, is the filesystem adapter of a broader pattern. Mine is the TickTick adapter. Yours might be the Notion or Obsidian one.

The adapter interface

Six methods. Two payload shapes.

interface KnowledgeAdapter {
  listProjects(): Promise<Project[]>
  listTasksInProject(projectId: string): Promise<Task[]>
  getTask(projectId: string, taskId: string): Promise<Task>
  createTask(input: TaskInput): Promise<Task>
  updateTask(projectId: string, taskId: string, patch: TaskPatch): Promise<Task>
  urlFor(ref: { projectId: string, taskId: string }): string  // deep-link string
}

type Project = { id: string, name: string, kind?: 'tasks' | 'notes' }
type Task    = { id: string, title: string, content: string, projectId: string, tags: string[], dueDate?: string, modifiedTime: string }

Anything you can list, get, and link to — task systems, note apps, plain folders — can be an adapter.

If your storage exposes a native search endpoint, your adapter can implement an optional searchByQuery(query) and the core will use it as one branch of the parallel retrieval. If not, the core falls back to its own keyword scan against the corpus.

That's the whole interface. Everything interesting is in the Core.

Two patterns the Core implements (worth stealing)

1. Agent-data notes

A regular note whose body has a fenced

json (or

yaml) block. Humans read the prose at the top. Agents extract the JSON via the adapter:

**Type:** agent-data
**Consumed by:** EOD triage cron, capture-time relevance enrichment

A "trunk" is an active project the user cares about. Edit this list when
projects launch, finish, or shift focus.

```

json
{
  "trunks": [
    { "name": "release-engineering", "desc": "shipping cadence, deployment rituals, on-call rotation" },
    { "name": "writing-projects", "desc": "drafts and edits across personal and client channels" }
  ]
}


```
```

`

Read it from any cron or agent:

```bash
akb get "Trunk Catalog" --extract json | jq '.trunks[].name'
```

The benefit: one note, mobile-editable in your existing app, consumed by agents as structured data. **Single source of truth, no schema migration.** This pattern works for anything an agent needs programmatically and a human needs to edit on the move: prompt templates, character locks for video projects, recurring queries, cron config.

### 2. Parallel retrieval with provenance

Three retrievers run in parallel against a shared cached corpus, results are RRF-fused, and the top-K come back tagged with which retrievers agreed:

- **Hybrid** — dense cosine (qdrant + nomic-embed) + sparse keyword, internally RRF'd
- **Keyword** — substring match on title + content
- **Notes-find** — title-fuzzy on a designated wiki project

For a query like `openrouter api key`, all three retrievers return the same gold note. The fused result tags it `sources: [hybrid, keyword, notes_find]` — three independent signals agreeing means high confidence. Lower-ranked results have only one source — look at them with skepticism.

For a query like `ffmpeg commands`, the keyword tool misses (the literal phrase isn't in any document). Pure semantic misses too (nomic-embed underweights short titles like `ffmpeg`). Hybrid catches it. The fan-out gracefully handles the asymmetry — different queries lean on different retrievers, and the core doesn't pretend any single algorithm is universally best.

A 5-min disk-backed corpus cache means warm queries are sub-100ms. The first call after a cold start fetches your full task/note list (one batch — adapters that support it use a single API call; adapters that don't fall back to per-project iteration). Within a working session, retrieval is essentially free.

## The bench

I built a small harness in `bench/`. Questions paired with gold answers (the task or note that actually contains the answer). Each retriever runs against the same questions, results scored by hit@1 / recall@5 / MRR.

Five agent-issued queries (the rephrased version Opus 4.7 actually generates, not the natural-language form a human types):

| Method                              | hit@1 | recall@5 | MRR  | warm latency  |
| ----------------------------------- | ----- | -------- | ---- | ------------- |
| keyword (substring)                 | 20%   | 20%      | 0.20 | <100ms        |
| semantic (dense only)               | 20%   | 40%      | 0.30 | ~300ms        |
| hybrid (dense + sparse RRF)         | 60%   | 80%      | 0.70 | ~500ms        |
| **find** (parallel + cache)         | **60%** | **80%** | **0.70** | **~93ms** |

`find` matches `hybrid` on accuracy and beats it on warm latency by ~5x. Plus the provenance tags. The benchmark won't generalize from five questions — it's a leading indicator. Grow it as confidence in a particular adapter accumulates.

## Why I optimized for the model, not for me

There's a subtle reframe that took an embarrassing number of iterations to land.

When *I* use search, I type a single word: `ffmpeg`. The keyword tool returns the right note instantly.

When *Claude* uses search on my behalf — "where did I document my ffmpeg workflow?" — it issues something like `find "What ffmpeg commands do I have notes on?"`. Different shape entirely. The model writes longer queries. It uses question phrasing. It includes scope words.

Optimizing for human queries was the wrong objective. The user (me) wasn't using these tools — Claude was. Every retrieval test had to be written in the form Opus 4.7 actually generates, not how I'd type it. That changes which retriever wins.

Tomorrow's model writes queries differently. The benchmark needs to track *the model in use*, not a fixed assumption about query shape. The bench file is short and dated; re-tune when the model changes.

## What I deliberately didn't build (yet)

Karpathy's wiki post mentions periodically updating notes when facts change — propagating new information across the knowledge base. Useful at scale; auto-rewriting notes is high-blast-radius and needs an approval ramp before it's trustworthy. I sketched it: a weekly cron that semantic-searches for affected notes, drafts updates, queues them for my approval, applies the approved ones. Deferred.

Same call on a "lint the wiki" pass (Karpathy idea: agent reads every note weekly, flags missing summaries, dangling references, contradictions). Useful at scale; premature when the wiki itself is still under construction.

Both will live in Core when they ship — adapter-agnostic by design.

## A daily flow (my setup, your tools optional)

This is what runs:

- **Capture (mobile, manual).** I add a task or note in my storage app. No CLI involved. The friction has to be zero.
- **Capture-time relevance prompt (when in a Claude session).** `akb create "..." --relevance` appends a small instruction block to the result. Active Claude reads it, picks a project trunk, calls `akb update` to append a `why: <trunk> — <reason>` line. Five seconds of LLM-side reasoning makes that task much more retrievable later.
- **EOD triage (cron, daily morning).** Pulls yesterday's completed tasks, scores them 0–3 against the trunks (read live from a `Trunk Catalog` agent-data note), sends a Telegram message with keepers grouped by trunk. I read it on my phone with breakfast.
- **Retrieval (during work, all surfaces).** When Claude needs context — `akb find <query>` returns top-K with provenance. Cached, parallel, sub-100ms warm.

Swap "TickTick app" for "Notion / Obsidian / Things" and the flow is identical. The adapter changes, the daily ritual doesn't.

## Roadmap

- **v0.1** — Core + reference TickTick adapter + bench (where I am today)
- **v0.2** — Filesystem adapter (Karpathy-style local markdown). Probably one weekend's work.
- **v0.3** — Notion adapter (community contribution most likely)
- **v0.4** — Lint pass + fact-propagation queue with approval gate
- **v0.5** — Adapter for Apple Notes / Things / iA Writer (Mac-native captures)

## Code

> https://github.com/renezander030/agentic-knowledge-base — https://github.com/renezander030/agentic-knowledge-base
> https://gist.github.com/renezander030/6c8c3cd62dedaf6e78ffb5b5493830c6 — https://gist.github.com/renezander030/6c8c3cd62dedaf6e78ffb5b5493830c6

Karpathy's wiki idea is right. The implementation that fits an existing system isn't a folder of markdown — it's the agent-side primitives that turn whatever you already have into something the model can reason over.

If you write your own adapter, I want to see it.

—

*Posted from https://renezander.com/p/agentic-knowledge-base/. Source at https://github.com/renezander030/agentic-knowledge-base.*


---

*I write field notes from real builds — AI integration, cron-driven automation, and the parts that break in production. New posts every two weeks at [renezander.com](https://renezander.com).*