BANKING77-77: New best of 94.61% on the official test set (+0.13pp) over our previous tests 94.48%.

Reddit r/artificial / 4/9/2026

💬 OpinionModels & Research

共有:

Key Points

A new result of 94.61% accuracy on the official BANKING77-77 test set is reported, improving on the submitter’s previous best of 94.48% by +0.13 percentage points.
The work claims no test leakage by using 5-fold cross-validation on the official training set to lock the recipe, followed by retraining on the full 100% training data for a single final evaluation on the official test set.
The improvement is attributed to a multiview encoder adaptation applied to the last layers, described as a lightweight change that finally transferred holdout gains to the held-out test.
The model is described as relatively compact (about 68 MiB) with ~216 ms inference time.
The post invites others to share experiences with plateaus where holdout improvements failed to carry over to official test performance.

Hi everyone,

Just wanted to share a small but hard-won milestone.

After a long plateau at 94.48%, we’ve pushed the official BANKING77-77 test set (original noisy training data, strict full-train protocol) to 94.61%.

Key details:

+0.13pp over our previous best
+0.78pp over the widely cited 93.83% baseline (Official SOTA seat at 94.94%)
No test leakage — 5-fold CV on official train to freeze recipe, then retrain on 100% train data, single final test eval

The model remains relatively compact (~68 MiB footprint, ~216 ms inference).

This was achieved through multiview encoder adaptation on the last layers — a relatively lightweight change that finally moved the needle after many smaller tweaks failed to transfer from holdout to test.

Curious if anyone else has hit similar walls where holdout gains refused to transfer to a true held-out test set, and what eventually worked for you.

submitted by /u/califalcon
[link] [comments]

💡 Insights using this article

This article is featured in our daily AI news digest — key takeaways and action items at a glance.

📅 4/9DailyView insight →

Meta's latest model is as open as Zuckerberg's private school

The Register

A Comprehensive Implementation Guide to ModelScope for Model Search, Inference, Fine-Tuning, Evaluation, and Export

MarkTechPost

Finally Abliterated Sarvam 30B and 105B!

Reddit r/artificial

win, wsl or linux?

Reddit r/LocalLLaMA

Goodbye, Llama? Meta launches new proprietary AI model Muse Spark — first since Superintelligence Labs' formation

VentureBeat

BANKING77-77: New best of 94.61% on the official test set (+0.13pp) over our previous tests 94.48%.

Key Points

💡 Insights using this article

Related Articles

Meta's latest model is as open as Zuckerberg's private school

A Comprehensive Implementation Guide to ModelScope for Model Search, Inference, Fine-Tuning, Evaluation, and Export

Finally Abliterated Sarvam 30B and 105B!

win, wsl or linux?

Goodbye, Llama? Meta launches new proprietary AI model Muse Spark — first since Superintelligence Labs' formation

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer