ProbeLogits: Kernel-Level LLM Inference Primitives for AI-Native Operating Systems

arXiv cs.LG / 4/15/2026

💬 OpinionDeveloper Stack & InfrastructureIdeas & Deep AnalysisModels & Research

Key Points

  • ProbeLogits is introduced as a kernel-level LLM inference primitive that performs a single forward pass and reads specific token logits to classify agent actions as safe or dangerous without any learned parameters.
  • The method uses a deployment-time calibration knob, the parameter α, to tune governance behavior (e.g., stricter settings to maximize recall for privileged operations and relaxed settings to maximize precision for conversational agents).
  • On an OS action benchmark with 260 prompts and multiple action categories (including adversarial attacks), ProbeLogits reports high performance (F1=0.980, Precision=1.000, Recall=0.960) using a general-purpose 7B model at 4-bit quantization.
  • On ToxicChat, it achieves F1=0.790 at α=1.0 and improves to F1=0.837 at α=0.5, reaching about 89% of Llama Guard 3’s F1 while requiring zero learned parameters.
  • Implemented in Anima OS (a bare-metal x86_64 Rust OS), the authors argue the enforcement sits below the WASM sandbox boundary, making it harder to evade, and they discuss using KV-cache as process state to enable checkpoint/restore/fork-like operations.

Abstract

An OS kernel that runs LLM inference internally can read logit distributions before any text is generated -- and act on them as a governance primitive. I present ProbeLogits, a kernel-level operation that performs a single forward pass and reads specific token logits to classify agent actions as safe or dangerous, with zero learned parameters. On a 260-prompt OS action benchmark (9 categories including adversarial attacks), ProbeLogits achieves F1=0.980, Precision=1.000, and Recall=0.960 using a general-purpose 7B model at 4-bit quantization. On ToxicChat (1,000 human-annotated real conversations), it achieves F1=0.790 at default calibration strength \alpha=1.0, improving to F1=0.837 at \alpha=0.5 -- 89% of Llama Guard 3's F1~0.939 with zero learned parameters. A key design contribution is the calibration strength \alpha, which serves as a deployment-time policy knob rather than a learned hyperparameter. By adjusting \alpha, the OS can enforce strict policies for privileged operations (\alpha \geq 0.8, maximizing recall) or relaxed policies for conversational agents (\alpha=0.5, maximizing precision). Contextual calibration improves accuracy from 64.8% to 97.3% on the custom benchmark. I implement ProbeLogits within Anima OS, a bare-metal x86_64 OS written in 80,400 lines of Rust. Because agent actions must pass through kernel-mediated host functions, ProbeLogits enforcement operates below the WASM sandbox boundary, making it significantly harder to circumvent than application-layer classifiers. Each classification costs 65ms on 7B -- fast enough for per-action governance. I also show that treating KV cache as process state enables checkpoint, restore, and fork operations analogous to traditional process management. To my knowledge, no prior system exposes LLM logit vectors as OS-level governance primitives.