94.42% on BANKING77 Official Test Split — New Strong 2nd Place with Lightweight Embedding + Rerank (no 7B LLM)

Reddit r/artificial / 4/7/2026

📰 NewsSignals & Early TrendsModels & Research

共有:

Key Points

記事では、Seed AutoArch上で構築した「軽量な埋め込み＋例ベースのリランキング」のみで、BANKING77の公式PolyAIテスト分割に対し94.42%精度（Macro-F1 0.9441）を達成したと報告しています。
ベンチマーク上の改善幅として、広く引用される93.83%ベースラインから+0.59pp更新し、公開リーダーボードで2位（SOTA 94.94%との差0.52pp）に位置付けられたと述べています。
大規模言語モデル（7B+）を使わずに高性能を出しており、推論コストは約225ms、モデルサイズも約68MiBと低く抑えた点が強調されています。
近々HF Spaceでデモ公開予定で、手法の高レベル概要や質問への回答も行う見込みだとしています。

94.42% on BANKING77 Official Test Split — New Strong 2nd Place with Lightweight Embedding + Rerank (no 7B LLM)

94.42% Accuracy on Banking77 Official Test Split

BANKING77-77 is deceptively hard: 77 fine-grained banking intents, noisy real-world queries, and significant class overlap.

I’m excited to share that I just hit 94.42% accuracy on the official PolyAI test split using a pure lightweight embedding + example reranking system built inside Seed AutoArch framework.

Key numbers:

Official test accuracy: 94.42%

Macro-F1: 0.9441

Inference: ~225 ms / ~68 MiB

Improvement: +0.59pp over the widely-cited 93.83% baseline

This puts the result in clear 2nd place on the public leaderboard, only 0.52pp behind the current absolute SOTA (94.94%).

No large language models, no 7B+ parameter monsters

just efficient embedding + rerank magic.

Results, and demo coming very soon on HF Space

Happy to answer questions about the high-level approach

#BANKING77 #IntentClassification #EfficientAI #SLM

submitted by /u/califalcon
[link] [comments]

Black Hat Asia

AI Business

Meta Superintelligence Lab Releases Muse Spark: A Multimodal Reasoning Model With Thought Compression and Parallel Agents

MarkTechPost

Chatbots are great at manipulating people to buy stuff, Princeton boffins find

The Register

I tested and ranked every ai companion app I tried and here's my honest breakdown

Reddit r/artificial

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Dev.to

94.42% on BANKING77 Official Test Split — New Strong 2nd Place with Lightweight Embedding + Rerank (no 7B LLM)

Key Points

Related Articles

Black Hat Asia

Meta Superintelligence Lab Releases Muse Spark: A Multimodal Reasoning Model With Thought Compression and Parallel Agents

Chatbots are great at manipulating people to buy stuff, Princeton boffins find

I tested and ranked every ai companion app I tried and here's my honest breakdown

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer