| 94.42% Accuracy on Banking77 Official Test Split BANKING77-77 is deceptively hard: 77 fine-grained banking intents, noisy real-world queries, and significant class overlap. I’m excited to share that I just hit 94.42% accuracy on the official PolyAI test split using a pure lightweight embedding + example reranking system built inside Seed AutoArch framework. Key numbers: Official test accuracy: 94.42% Macro-F1: 0.9441 Inference: ~225 ms / ~68 MiB Improvement: +0.59pp over the widely-cited 93.83% baseline This puts the result in clear 2nd place on the public leaderboard, only 0.52pp behind the current absolute SOTA (94.94%). No large language models, no 7B+ parameter monsters just efficient embedding + rerank magic. Results, and demo coming very soon on HF Space Happy to answer questions about the high-level approach #BANKING77 #IntentClassification #EfficientAI #SLM [link] [comments] |
94.42% on BANKING77 Official Test Split — New Strong 2nd Place with Lightweight Embedding + Rerank (no 7B LLM)
Reddit r/artificial / 4/7/2026
📰 NewsSignals & Early TrendsModels & Research
Key Points
- 記事では、Seed AutoArch上で構築した「軽量な埋め込み+例ベースのリランキング」のみで、BANKING77の公式PolyAIテスト分割に対し94.42%精度(Macro-F1 0.9441)を達成したと報告しています。
- ベンチマーク上の改善幅として、広く引用される93.83%ベースラインから+0.59pp更新し、公開リーダーボードで2位(SOTA 94.94%との差0.52pp)に位置付けられたと述べています。
- 大規模言語モデル(7B+)を使わずに高性能を出しており、推論コストは約225ms、モデルサイズも約68MiBと低く抑えた点が強調されています。
- 近々HF Spaceでデモ公開予定で、手法の高レベル概要や質問への回答も行う見込みだとしています。
Related Articles

Black Hat Asia
AI Business

Meta Superintelligence Lab Releases Muse Spark: A Multimodal Reasoning Model With Thought Compression and Parallel Agents
MarkTechPost

Chatbots are great at manipulating people to buy stuff, Princeton boffins find
The Register
I tested and ranked every ai companion app I tried and here's my honest breakdown
Reddit r/artificial

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.
Dev.to