Mitigating hallucination [P]

Reddit r/MachineLearning / 4/24/2026

💬 OpinionIdeas & Deep AnalysisTools & Practical UsageModels & Research

Key Points

  • The post describes a lightweight hallucination-reduction approach for LLMs that avoids external judges, additional human labels, and heavy preference-learning pipelines.
  • It uses a frozen base model to generate a “bad” counterfactual answer, then trains the adapted model to contrast the correct response against the bad branch only after the first point where they diverge.
  • The method selectively updates on only cases where the model still gives excessive support to the bad continuation, so roughly 10% of training examples trigger parameter updates.
  • In experiments, the approach improves factuality compared with standard cross-entropy (CE) training and DPO-style baselines, including under out-of-distribution settings, while showing about a 6 percentage-point decrease vs. DPO and about 1 percentage-point decrease vs. SFT.
  • The results suggest that using per-sample selective fitting can enhance generalization and that simply having larger datasets does not guarantee better performance.

Hi, Everyone. I repost this since my previous one was deleted(I don't know why, might be low quality of writing?)

I’ve been working on a lightweight way to reduce hallucinations in LLMs without relying on external judges, extra human labels, or heavy preference-learning pipelines.

The basic idea is simple: let a frozen base model generate a “bad” counterfactual answer, then train the adapted model to contrast the correct answer against that bad branch only from the first point where they diverge.

Instead of updating on every sample, the method self-selects cases where the bad continuation is still getting too much support from the model.

In practice, this means only about 10% of the training examples actually trigger updates, but the model still improves factuality over standard CE training and DPO-style baselines.

I also tested it under out-of-distribution settings, where the gains remained consistent rather than only fitting the training benchmark.

It showed good performance on ood datasets.

Compared to DPO, it showed about 6%p decrease.
Compared to sft, it showed about 1%p decrease.

Both result used only about 10% dataset while DPO, SFT used full dataset.

I think it means two things:
1) samplewise fitting helps model to generalize on dataset.
2) many dataset does not always mean it will show good performance.

github link : genji970/hallucination-mitigation-via-contrastive-sampling-method: Selective contrastive post-training for hallucination mitigation in LLMs — improves factuality with ~10% data.

submitted by /u/Round_Apple2573
[link] [comments]