Classical OOD detection assumes you can see the model. Mahalanobis on features and energy on logits are typical, and both require cracking the model open.
With closed LLM APIs you get text in, text out, and maybe top K logprobs per token if you are lucky. The methods that survive that constraint are sampling consistency like SelfCheckGPT, token level entropy on whatever logprobs the API exposes, proxy embeddings from your own encoder, or a separate verifier model on the output. What is bothering me is that classical OOD and hallucination detection collapse into the same problem in that setting, because both manifest as the model producing unreliable text.
If you are running closed LLMs in production right now, what is your actual OOD signal and how do you decide when to trust the output.
[link] [comments]



