How do you do OOD detection on a closed LLM API with no latent access?

Reddit r/artificial / 5/20/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisTools & Practical Usage

Key Points

  • The post explains that traditional out-of-distribution (OOD) detection methods (e.g., Mahalanobis distance on internal features or energy-based scoring on logits) typically require access to model internals, which closed LLM APIs do not provide.
  • It outlines practical alternatives for closed APIs, including sampling-consistency approaches such as SelfCheckGPT, using token-level entropy from any available logprobs, creating proxy embeddings via a separate encoder, and using a separate verifier model to judge outputs.
  • It argues that, under these constraints, classical OOD detection and hallucination detection become effectively the same problem because both ultimately show up as the model generating unreliable text.
  • The author challenges readers to describe what real OOD signals they use in production for closed LLMs and how they decide when to trust responses.

Classical OOD detection assumes you can see the model. Mahalanobis on features and energy on logits are typical, and both require cracking the model open.

With closed LLM APIs you get text in, text out, and maybe top K logprobs per token if you are lucky. The methods that survive that constraint are sampling consistency like SelfCheckGPT, token level entropy on whatever logprobs the API exposes, proxy embeddings from your own encoder, or a separate verifier model on the output. What is bothering me is that classical OOD and hallucination detection collapse into the same problem in that setting, because both manifest as the model producing unreliable text.

If you are running closed LLMs in production right now, what is your actual OOD signal and how do you decide when to trust the output.

submitted by /u/kamilc86
[link] [comments]