Hi everyone,
Just wanted to share a small but hard-won milestone.
After a long plateau at 94.48%, we’ve pushed the official BANKING77-77 test set (original noisy training data, strict full-train protocol) to 94.61%.
Key details:
- +0.13pp over our previous best
- +0.78pp over the widely cited 93.83% baseline (Official SOTA seat at 94.94%)
- No test leakage — 5-fold CV on official train to freeze recipe, then retrain on 100% train data, single final test eval
The model remains relatively compact (~68 MiB footprint, ~216 ms inference).
This was achieved through multiview encoder adaptation on the last layers — a relatively lightweight change that finally moved the needle after many smaller tweaks failed to transfer from holdout to test.
Curious if anyone else has hit similar walls where holdout gains refused to transfer to a true held-out test set, and what eventually worked for you.
[link] [comments]

