INT8 quantization gives me better accuracy than FP16 ! [D]

Reddit r/MachineLearning / 4/27/2026

💬 OpinionIdeas & Deep AnalysisTools & Practical UsageModels & Research

Key Points

  • The author reports that a deep learning model shows higher inference accuracy with INT8 post-training quantization than with FP16, even though FP16 is usually expected to be closer to FP32.
  • The experiment uses an ONNX-exported model, comparing FP16 inference directly versus INT8 inference after quantization, without major architecture changes.
  • The post asks whether others have observed the same phenomenon and seeks explanations for how INT8 could outperform FP16 during inference.
  • The underlying question highlights potential factors such as quantization effects, inference-time numeric behavior, and deployment-specific implementation differences between FP16 and INT8 paths.

Hi everyone,

I’m working on a deep learning model and I noticed something strange.

When I compare different precisions: FP32 (baseline)

FP16 , INT8 (post-training quantization)

I’m getting better inference accuracy with INT8 than FP16, which I didn’t expect.

I thought FP16 should be closer to FP32 and therefore more accurate than INT8, but in my case INT8 is actually performing better.

Has anyone seen this before? What could explain INT8 outperforming FP16 in inference?

Setup details:

Model exported via ONNX

FP16 used directly / INT8 via quantization

No major architecture changes

submitted by /u/Fragrant_Rate_2583
[link] [comments]