BANKING77-77: New best of 94.61% on the official test set (+0.13pp) over our previous tests 94.48%.

Reddit r/artificial / 4/9/2026

💬 OpinionModels & Research

Key Points

  • A new result of 94.61% accuracy on the official BANKING77-77 test set is reported, improving on the submitter’s previous best of 94.48% by +0.13 percentage points.
  • The work claims no test leakage by using 5-fold cross-validation on the official training set to lock the recipe, followed by retraining on the full 100% training data for a single final evaluation on the official test set.
  • The improvement is attributed to a multiview encoder adaptation applied to the last layers, described as a lightweight change that finally transferred holdout gains to the held-out test.
  • The model is described as relatively compact (about 68 MiB) with ~216 ms inference time.
  • The post invites others to share experiences with plateaus where holdout improvements failed to carry over to official test performance.

Hi everyone,

Just wanted to share a small but hard-won milestone.

After a long plateau at 94.48%, we’ve pushed the official BANKING77-77 test set (original noisy training data, strict full-train protocol) to 94.61%.

Key details:

  • +0.13pp over our previous best
  • +0.78pp over the widely cited 93.83% baseline (Official SOTA seat at 94.94%)
  • No test leakage — 5-fold CV on official train to freeze recipe, then retrain on 100% train data, single final test eval

The model remains relatively compact (~68 MiB footprint, ~216 ms inference).

This was achieved through multiview encoder adaptation on the last layers — a relatively lightweight change that finally moved the needle after many smaller tweaks failed to transfer from holdout to test.

Curious if anyone else has hit similar walls where holdout gains refused to transfer to a true held-out test set, and what eventually worked for you.

submitted by /u/califalcon
[link] [comments]