"I don't know!": Teaching neural networks to abstain with the HALO-Loss. [R]

Reddit r/MachineLearning / 4/14/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisTools & Practical UsageModels & Research

共有:

Key Points

The post argues that standard cross-entropy training leaves neural networks unable to represent “I don’t know,” causing overconfident behavior and hallucinations when given garbage or out-of-distribution inputs.
It introduces HALO-Loss as a drop-in replacement for cross-entropy that uses Euclidean distance (instead of an unconstrained dot product) to bound confidence and add a zero-parameter “Abstain Class” tied to the latent-space origin.
The author claims HALO improves calibration substantially (e.g., ECE from ~8% to ~1.5%) while avoiding the usual safety-vs-accuracy tradeoff, reporting near-zero base accuracy change on CIFAR-10/100.
In OOD testing (e.g., SVHN), the method reportedly cuts false positives at high recall (FPR@95) by more than half compared with standard cross-entropy.
The work is released as open source, includes a detailed mathematical/code breakdown, and is positioned for safety-critical classification and rejection thresholds in multimodal models.
Point 5: It is also presented as a way to achieve native outlier detection without heavy ensembles, post-hoc scoring, or training on outlier exposure (as typically done in OOD pipelines).

"I don't know!": Teaching neural networks to abstain with the HALO-Loss. [R]

Current neural networks have a fundamental geometry problem: If you feed them garbage data, they won't admit that they have no clue. They will confidently hallucinate.
This happens because the standard Cross-Entropy loss requires models to push their features "infinitely" far away from the origin to reach a loss of 0.0 which leaves the model with a jagged latent space. It literally leaves the model with no mathematically sound place to throw its trash.

I've been working on a "fix" for this, and as a result I just open-sourced the HALO-Loss.

It's a drop-in replacement for Cross-Entropy, but by trading the unconstrained dot-product for euclidean distance, HALO bounds maximum confidence to a finite distance from a learned prototype. This allows it to bolt a zero-parameter "Abstain Class" directly to the origin of the latent space. Basically, it gives the network a mathematically rigorous "I don't know" button for free.

Usually in AI safety, building better Out-of-Distribution (OOD) detection means sacrificing your base accuracy. With HALO, that safety tax basically vanishes.

Testing on CIFAR-10/100 against standard CCE:

Base Accuracy: Zero drop (actually +0.23% on CIFAR10, -0.14% on CIFAR100).
Calibration (ECE): Dropped from ~8% down to a crisp 1.5%.
Far OOD (SVHN) False Positives (FPR@95): Slashed by more than half (e.g., 22.08% down to 10.27%).

Comparing the results on OpenOOD, getting this kind of native outlier detection without heavy ensembles, post-hoc scoring tweaks, or exposing the model to outlier data during training is incredibly rare.

At the same time HALO is super useful if you're working on safety-critical classification, or if you're training multi-modal models like CLIP and need a mathematically sound rejection threshold for unaligned text-image pairs.

I wrote a detailed breakdown on the math, the code, and on the tricks to avoid fighting high-dimensional gaussians soap bubbles.
Blog-post: https://pisoni.ai/posts/halo/

Also, feel free to give HALO a spin on your own data, see if it improves your network's overconfidence and halucinations, and let me know what you find.
Code: https://github.com/4rtemi5/halo

https://preview.redd.it/loxsfywek4vg1.png?width=1005&format=png&auto=webp&s=837ca4a202e984f1fe561314513640bd6c93481d

Here is how it actually works:

Instead of simply using the result of the last layer as logits, we use the negative squared euclidean distance between the sample-embedding and the learned embeddings of the class prototypes. This can easily be simplified:
-||x−c||² = -||x||² + 2(x⋅c) - ||c||²

Since the -||x||² term is a constant for the whole row being fed into the softmax, we can just drop it, leaving us with a shifted logit:

logit = 2(x⋅c) - ||c||²

which is just a dot product penalized by the squared L2-norm of the centroids, which keeps the distribution tightly packed.

However since high dimensional gaussians are not solid balls but have the probabilistic mass distribution of a soap-bubble (thin wall, empty center) we can't force the embedding to align perfectly without losing a lot of model capacity. Instead we want the model to align the sample embeddings with the thin wall of the gaussian soap-bubble using the radial negative log-likelihood as a regularizer.

Finally since we force the clusters to locate around the origin anyways, we can put an additional "abstain class" onto it. This gives the model the option to assign a certain amount of probability to no class at all (kind of like a register/attention sink in modern LLMs). We can associate this abstain class with a "cost" through a bias, which also leaves us with a cross-entropy grounded abstain threshold that does not need to be tuned.

For even more details please take a peek at the links or ask in the comments.

Happy to help and glad about any feedback! :)

submitted by /u/4rtemi5
[link] [comments]