Reciting a YouTube explanation is not derivation. Watching 3Blue1Brown on backpropagation and being able to parrot the explanation back is a recognition skill. It looks like understanding if nobody probes. The moment someone changes the activation function or asks why a minus sign is on a particular term, the recognition collapses and reveals that no derivation ever happened — only pattern-matching against a specific video.
The whiteboard test is what the top tier of applied ML/AI interviews uses to tell the difference. It is also what I run on myself for every math concept in the curriculum. Six rules. No exceptions. A concept that cannot pass the whiteboard test has not been understood, regardless of how confident the student feels.
The rule is codified in the math-foundation.md file of my framework: claude-code-agent-skills-framework, under "Hard Verification Protocol (FAANG-level gate)."
Rule 1 — Blank-sheet start
The test begins with the student erasing all notes. Phone face-down. No videos playing. A camera or shared screen shows an empty page. No reference material within sight.
The reason: any recall that happens in the presence of notes is indistinguishable from reading the notes out loud. The only derivation that counts is the one produced from memory, under observation, against a problem the student did not set up.
Without this rule, "I understood the math" means "I once followed someone else's math and it seemed right at the time." Those are different statements.
Rule 2 — Randomized variation
The interrogator introduces at least one variation the student has not seen before. Different activation (ReLU instead of sigmoid). Different dimensions (a batch of 3 instead of 1). Different loss (hinge instead of cross-entropy). Different optimizer term (Adam's second moment).
The student adapts the derivation on the spot.
This is the rule that separates recognition from understanding. A student who has memorized "the derivative of sigmoid is sigmoid(x) * (1 - sigmoid(x))" can recite that line. A student who understands the derivation can re-derive it for ReLU, or for tanh, or for GeLU — because the underlying pattern (chain rule on the activation) is what they have internalized, not the specific formula.
Variation is not a gotcha. It is the test.
Rule 3 — Three "why" checkpoints
At three random points during the derivation, the interrogator interrupts with a "why" question the student cannot have memorized:
- "Why is there a minus sign on this term?"
- "What happens to this expression if the learning rate doubles?"
- "Why do we use the transpose here and not the original matrix?"
- "What would this reduce to if you removed the bias term?"
- "What does this term look like in the limit as the batch size goes to infinity?"
If the student cannot answer from first principles, the derivation fails and the exercise is not complete.
The reason these questions matter: they cannot be memorized because the interrogator selects them on the fly from an adversarial pool. A student who has derived the math from primitives can answer any of them in under 30 seconds. A student who has memorized the math can answer none of them.
Rule 4 — Reverse-direction test
After the forward derivation lands, the interrogator asks the student to explain a specific line in the middle: "Why does this term exist? What would the model look like without it?"
This is the reverse of the usual teaching direction. Usually the student builds up from assumptions to conclusions. The reverse-direction test picks a point in the middle and asks the student to defend it — to explain what the term contributes, what removing it would change, what the alternate formulation would look like.
A memorized derivation proceeds in one direction; it cannot defend itself at an arbitrary point. A derived derivation can, because every line is a consequence the student can justify from the surrounding context.
Rule 5 — Numerical grounding
The student plugs in actual small numbers — an input like [0.5, -0.3], weights like [[0.1, 0.2], [-0.1, 0.3]], a target of 1 — and computes the entire forward and backward pass by hand. The result must match the analytical derivation.
This rule catches a specific failure mode: a derivation that looks correct symbolically but breaks when actually executed. Off-by-one errors, missing transpositions, sign flips, dimension mismatches — all of them hide in the symbols and only surface under numerical substitution.
The numerical grounding is also the test case that becomes a unit test in the implementation. A student who has ground the math by hand can write a known-answer test that will catch the first implementation bug. A student who has not is guessing.
Rule 6 — Repeat with variation
For core concepts (backprop, attention, gradient descent, the math of a specific loss function), the verification happens at least three times across the curriculum, with different architectures or problems each time.
Once-pass is not mastery. A student who passed the gate on a single-layer MLP has demonstrated local understanding of that specific case. Mastery is demonstrated when the same derivation fires on a two-layer ReLU network, then on a transformer head, then on a convolutional layer — each time without re-learning the underlying mechanism.
This is the multi-shot version of the Bransford transfer test applied to mathematical derivation. A concept has transferred when it fires on a new instance without scaffolding. A concept has not transferred if every new instance requires going back to the original explanation.
Why this resists Claude-Operator drift specifically
The vibe-coding failure mode of operator work is accepting agent output without reading it. The learning equivalent is accepting an explanation without deriving it. Both produce the same symptom: apparent competence that collapses under adversarial probing.
The whiteboard test is specifically designed to resist this drift. No notes means no scrollback to the previous Claude conversation. Randomized variation means no memorization of a specific output. "Why" checkpoints mean no pattern-match against a standard explanation. Numerical grounding means no hand-waving the bits of the derivation the student did not actually check.
A concept that passes the whiteboard test has been derived by the student, in their own hand, under conditions that would have exposed any gap. That is the condition for calling the concept understood.
What failure looks like, and what to do
Failure on any rule is data, not judgment. The test is not pass/fail in the career-consequence sense — it is pass/fail in the "did the descent actually close or not" sense. A failed whiteboard test says: the learning did not land yet. Go back to the source — textbook chapter, paper, Karpathy lecture, the specific section of 3Blue1Brown — study it, and rerun the test within 48 hours.
What failure does NOT permit: the interrogator giving the student the answer. A failed derivation that gets filled in by the grader has not become a passing derivation; it has become a piece of dictated material that the student will fail on again next week.
Repeated failure on the same concept is its own signal. If the student has failed three times on the derivation of gradient descent, the underlying material was not absorbed — and the fix is to go back further in the prerequisite chain, not to re-read the same section harder.
The log that matters
Every whiteboard verification gets logged. Date, concept, whether the student passed on first attempt or needed retries, which variation was posed, which "why" questions fired. Over months, the log becomes a map of which concepts are load-bearing (passed consistently) versus fragile (required multiple attempts).
Fragile concepts get re-verified more often. Load-bearing concepts get spaced out. The log is what makes the next retest principled rather than random.
The gate in one sentence
If you cannot derive the concept on a blank sheet, adapting to a variation you did not prepare for, defending any line the interrogator points at, and grounding it numerically against a set of small inputs — then you do not understand the concept, regardless of how clearly the YouTube explanation landed.
That is the gate. Apply it to yourself. The concepts that pass become the ones you can still apply when the frontier shifts and the abstractions you relied on start leaking.
Aman Bhandari. Operator of an AI-engineering research lab running Claude Opus as the coaching partner, plus a QA-automation surface shipping against a real sprint workload. Public artifacts: claude-code-agent-skills-framework and claude-code-mcp-qa-automation. github.com/aman-bhandari.




