AI Navigate

インサイトインサイト最新記事最新記事一覧 AI大全AI大全カオスマップAIカオスマップ

2b or not 2b ? Custom LLM Scheduling Competition [P]

Reddit r/MachineLearning / 4/23/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisTools & Practical Usage

Read original →

共有:

Key Points

The author launched a Kaggle competition focused on deciding when to run a small LLM versus skipping a question to reduce token/compute costs.
Participants use MMLU benchmark questions and choose between running the small model ("2b") or not running any model ("none").
The scoring uses a weighted, cost-based metric that penalizes wasted compute, expensive failures, and also penalizes skipping when running would have succeeded.
The competition currently keeps the cost of running the model fixed (it does not model variable runtime cost yet), but the author plans to add more models over time to improve decision-making.

Hey everyone,

I am generally interested in resource management and notably reducing the token cost for a given answer. So I just launched a Kaggle competition around a simple question: whether you should run a small model or not. I plan to add more model over time for better decision making.

Here is the competition: https://www.kaggle.com/competitions/llm-scheduling-competition

The idea:

You get questions from the MMLU benchmark
Instead of answering them, you decide:
- 2b → run a small model
- none → skip it

Then there is a cost-based metric:

running the model costs compute
running it when it fails is expensive
skipping when it would have worked is also penalized

So the goal is to minimise weighted cost.

Currently the set up is quite simple as the cost to run your model is no taken into account. Still it might be a first step in the right direction.

Curious to see what people come up with—rules, classifiers, or something more creative.

Happy to discuss ideas or answer questions!

submitted by /u/WERE_CAT
[link] [comments]

Related Articles

Black Hat USA

Black Hat USA

AI Business

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Dev.to

Trajectory Forecasts in Unknown Environments Conditioned on Grid-Based Plans

Trajectory Forecasts in Unknown Environments Conditioned on Grid-Based Plans

Dev.to

10 AI Tools Every Developer Should Try in 2026

10 AI Tools Every Developer Should Try in 2026

Dev.to

Why use an AI gateway at all?

Why use an AI gateway at all?

Dev.to

関連おすすめサービス

※当サイトはアフィリエイト広告を利用しています

Notta搭載AI議事録イヤホン ZENCHORD1

AI時代の仕事術。Notta搭載で会議の議事録を自動生成するスマートイヤホン。

AI搭載ボイスレコーダー Plaud

世界100万人が愛用。AIで文字起こし・要約を自動化するボイスレコーダー。

画像高画質化AIツール Aiarty Image Enhancer

AIで画像を高画質化。写真・イラストを簡単にアップスケール。